Efficient scene text image super-resolution with semantic guidance
Abstract: Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight solutions in deployment scenarios. Faced with the issues, our work proposes an efficient framework called SGENet to facilitate deployment on resource-limited platforms. SGENet contains two branches: super-resolution branch and semantic guidance branch. We apply a lightweight pre-trained recognizer as a semantic extractor to enhance the understanding of text information. Meanwhile, we design the visual-semantic alignment module to achieve bidirectional alignment between image features and semantics, resulting in the generation of highquality prior guidance. We conduct extensive experiments on benchmark dataset, and the proposed SGENet achieves excellent performance with fewer computational costs. Code is available at https://github.com/SijieLiu518/SGENet
- “Scene text image super-resolution in the wild,” in ECCV, 2020, pp. 650–666.
- “Scene text telescope: Text-focused scene image super-resolution,” in CVPR, 2021, pp. 12021–12030.
- “Text prior guided scene text image super-resolution,” IEEE Transactions on Image Processing, vol. 32, pp. 1341–1353, 2023.
- “A text attention network for spatial deformation robust scene text image super-resolution,” in CVPR, 2022, pp. 5911–5920.
- “Svtr: Scene text recognition with a single visual model,” in IJCAI, 2022, pp. 884–890.
- “Toward understanding wordart: Corner-guided transformer for scene text recognition,” in ECCV, 2022, pp. 303–321.
- “Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition,” in CVPR, 2021, pp. 7094–7103.
- “Publaynet: largest dataset ever for document layout analysis,” in ICDAR, 2019, pp. 1015–1022.
- “Beyond document object detection: instance-level segmentation of complex layouts,” IJDAR, vol. 24, no. 3, pp. 269–281, 2021.
- “Eaten: Entity-aware attention for single shot visual text extraction,” in ICDAR, 2019, pp. 254–259.
- “Efficient and accurate arbitrary-shaped text detection with pixel aggregation network,” in CVPR, 2019, pp. 8440–8449.
- “Boosting optical character recognition: A super-resolution approach,” arXiv:1506.02211, 2015.
- “Image super-resolution using deep convolutional networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295–307, 2015.
- H. T. Tran and T. Ho-Phuoc, “Deep laplacian pyramid network for text images super-resolution,” in IEEE-RIVF International Conference on Computing and Communication Technologies, 2019, pp. 1–6.
- “Deep laplacian pyramid networks for fast and accurate super-resolution,” in CVPR, 2017, pp. 624–632.
- “Towards robust scene text image super-resolution via explicit location enhancement,” in IJCAI, 2023.
- “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 2298–2304, 2015.
- “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in ICML, 2006.
- “Towards accurate scene text recognition with semantic reasoning networks,” in CVPR, 2020, pp. 12110–12119.
- “Multi-modal text recognition networks: Interactive enhancements between visual and semantic features,” in ECCV, 2021.
- “Plugnet: Degradation aware scene text recognition supervised by a pluggable super-resolution unit,” in ECCV, 2020, pp. 158–174.
- “Scene text image super-resolution via parallelly contextual attention network,” in ACM Multimedia, 2021.
- “C3-stisr: Scene text image super-resolution with triple clues,” IJCAI, 2022.
- “Aster: An attentional scene text recognizer with flexible rectification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, pp. 2035–2048, 2019.
- “A multi-object rectified attention network for scene text recognition,” Pattern Recognition, vol. 90, pp. 109–118, 2019.
- “Photo-realistic single image super-resolution using a generative adversarial network,” in CVPR, 2016, pp. 105–114.
- “Toward real-world single image super-resolution: A new benchmark and a new model,” in ICCV, 2019, pp. 3086–3095.
- “Zoom to learn, learn to zoom,” in CVPR, 2019, pp. 3757–3765.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2014.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.