Seeing Text in the Dark: Algorithm and Benchmark (2404.08965v3)
Abstract: Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for localizing text in dark that circumvents the need for LLE. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. This module is designed to guide the text detector in preserving textual spatial features amidst feature map resizing, thus minimizing the loss of spatial information in texts under low-light visual degradations. Specifically, we incorporate spatial reconstruction and spatial semantic constraints within this module to ensure the text detector acquires essential positional and contextual range knowledge. Our approach enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. In addition, we present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. Notably, our method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released.
- Total-text: A comprehensive dataset for scene text detection and recognition. In Proc. IEEE Int. Conf. on Document Anal. and Recognit., volume 1, pages 935–942. IEEE, 2017.
- The farthest point strategy for progressive image sampling. IEEE Trans. Image Process., 6(9):1305–1315, 1997.
- Zero-reference deep curve estimation for low-light image enhancement. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 1780–1789, 2020.
- Synthetic data for text localisation in natural images. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 2315–2324, 2016.
- Extremely low-light image enhancement with scene text restoration. In Int. Conf. Pattern Recog., pages 317–323. IEEE, 2022.
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- Real-time scene text detection with differentiable binarization. In Proc. AAAI Conf. Artif. Intell., volume 34, pages 11474–11481, 2020.
- Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans. Pattern Anal. Mach. Intell., 45(1):919–931, 2022.
- Feature pyramid networks for object detection. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 2117–2125, 2017.
- Puzzlenet: scene text detection by segment context graph learning. arXiv preprint arXiv:2002.11371, 2020.
- List: low illumination scene text detector with automatic feature enhancement. The Vis. Comput., 38(9-10):3231–3242, 2022.
- Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 10561–10570, 2021.
- Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 9809–9818, 2020.
- Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit., 90:337–345, 2019.
- Textsnake: A flexible representation for detecting text of arbitrary shapes. In Proc. Eur. Conf. Comput. Vision, pages 20–36, 2018.
- Relatext: exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognit., 111:107684, 2021.
- Toward fast, flexible, and robust low-light image enhancement. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 5637–5646, 2022.
- Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In Proc. IEEE Int. Conf. on Document Anal. and Recognit., volume 1, pages 1454–1459. IEEE, 2017.
- Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 6070–6079, 2023.
- Towards robust real-time scene text detection: From semantic to instance representation learning. In Proc. ACM Int. Conf. Multimedia, pages 2025–2034, 2023.
- Detecting oriented text in natural images by linking segments. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 2550–2558, 2017.
- Few could be better than all: Feature sampling and grouping for scene text detection. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 4563–4572, 2022.
- Textray: Contour-based geometric modeling for arbitrary-shaped scene text detection. In Proc. ACM Int. Conf. Multimedia, pages 111–119, 2020.
- Masked text modeling: A self-supervised pre-training method for scene text detection. In Proc. ACM Int. Conf. Multimedia, pages 2006–2015, 2023.
- Shape robust text detection with progressive scale expansion network. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 9336–9345, 2019.
- Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In Proc. IEEE Int. Conf. Comput. Vision, pages 8440–8449, 2019.
- Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 11753–11762, 2020.
- Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 5901–5910, 2022.
- Arbitrary-shape scene text detection via visual-relational rectification and contour approximation. IEEE Trans. Multimedia, 2022.
- Morphtext: Deep morphology regularized accurate arbitrary-shape scene text detection. IEEE Trans. Multimedia, 2022.
- Arbitrarily-oriented text detection in low light natural scene images. IEEE Trans. Multimedia, 23:2706–2720, 2020.
- Detecting texts of arbitrary orientations in natural images. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 1083–1090. IEEE, 2012.
- Textfusenet: Scene text detection with richer fused features. Proc. Int. Joint Conf. Artif. Intell., 2020.
- Dptext-detr: Towards better scene text detection with dynamic points in transformer. In Proc. AAAI Conf. Artif. Intell., volume 37, pages 3241–3249, 2023.
- Arbitrary shape text detection via boundary transformer. IEEE Trans. Multimedia, 2023.
- Deep relational reasoning graph network for arbitrary shape text detection. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 9699–9708, 2020.
- Adaptive boundary proposal network for arbitrary shape text detection. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 1305–1314, 2021.
- Kindling the darkness: A practical low-light image enhancer. In Proc. ACM Int. Conf. Multimedia, pages 1632–1640, 2019.
- Deep color consistent network for low-light image enhancement. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 1899–1908, 2022.
- Fourier contour embedding for arbitrary-shaped text detection. In Proc. IEEE Conf. Comput. Vision and Pattern Recognit., pages 3123–3131, 2021.
- Chengpei Xu (12 papers)
- Hao Fu (82 papers)
- Long Ma (116 papers)
- Wenjing Jia (24 papers)
- Chengqi Zhang (74 papers)
- Feng Xia (171 papers)
- Xiaoyu Ai (6 papers)
- Binghao Li (5 papers)
- Wenjie Zhang (138 papers)