Papers
Topics
Authors
Recent
Search
2000 character limit reached

Efficient scene text image super-resolution with semantic guidance

Published 20 Mar 2024 in cs.CV | (2403.13330v1)

Abstract: Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight solutions in deployment scenarios. Faced with the issues, our work proposes an efficient framework called SGENet to facilitate deployment on resource-limited platforms. SGENet contains two branches: super-resolution branch and semantic guidance branch. We apply a lightweight pre-trained recognizer as a semantic extractor to enhance the understanding of text information. Meanwhile, we design the visual-semantic alignment module to achieve bidirectional alignment between image features and semantics, resulting in the generation of highquality prior guidance. We conduct extensive experiments on benchmark dataset, and the proposed SGENet achieves excellent performance with fewer computational costs. Code is available at https://github.com/SijieLiu518/SGENet

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. “Scene text image super-resolution in the wild,” in ECCV, 2020, pp. 650–666.
  2. “Scene text telescope: Text-focused scene image super-resolution,” in CVPR, 2021, pp. 12021–12030.
  3. “Text prior guided scene text image super-resolution,” IEEE Transactions on Image Processing, vol. 32, pp. 1341–1353, 2023.
  4. “A text attention network for spatial deformation robust scene text image super-resolution,” in CVPR, 2022, pp. 5911–5920.
  5. “Svtr: Scene text recognition with a single visual model,” in IJCAI, 2022, pp. 884–890.
  6. “Toward understanding wordart: Corner-guided transformer for scene text recognition,” in ECCV, 2022, pp. 303–321.
  7. “Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition,” in CVPR, 2021, pp. 7094–7103.
  8. “Publaynet: largest dataset ever for document layout analysis,” in ICDAR, 2019, pp. 1015–1022.
  9. “Beyond document object detection: instance-level segmentation of complex layouts,” IJDAR, vol. 24, no. 3, pp. 269–281, 2021.
  10. “Eaten: Entity-aware attention for single shot visual text extraction,” in ICDAR, 2019, pp. 254–259.
  11. “Efficient and accurate arbitrary-shaped text detection with pixel aggregation network,” in CVPR, 2019, pp. 8440–8449.
  12. “Boosting optical character recognition: A super-resolution approach,” arXiv:1506.02211, 2015.
  13. “Image super-resolution using deep convolutional networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295–307, 2015.
  14. H. T. Tran and T. Ho-Phuoc, “Deep laplacian pyramid network for text images super-resolution,” in IEEE-RIVF International Conference on Computing and Communication Technologies, 2019, pp. 1–6.
  15. “Deep laplacian pyramid networks for fast and accurate super-resolution,” in CVPR, 2017, pp. 624–632.
  16. “Towards robust scene text image super-resolution via explicit location enhancement,” in IJCAI, 2023.
  17. “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, pp. 2298–2304, 2015.
  18. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in ICML, 2006.
  19. “Towards accurate scene text recognition with semantic reasoning networks,” in CVPR, 2020, pp. 12110–12119.
  20. “Multi-modal text recognition networks: Interactive enhancements between visual and semantic features,” in ECCV, 2021.
  21. “Plugnet: Degradation aware scene text recognition supervised by a pluggable super-resolution unit,” in ECCV, 2020, pp. 158–174.
  22. “Scene text image super-resolution via parallelly contextual attention network,” in ACM Multimedia, 2021.
  23. “C3-stisr: Scene text image super-resolution with triple clues,” IJCAI, 2022.
  24. “Aster: An attentional scene text recognizer with flexible rectification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, pp. 2035–2048, 2019.
  25. “A multi-object rectified attention network for scene text recognition,” Pattern Recognition, vol. 90, pp. 109–118, 2019.
  26. “Photo-realistic single image super-resolution using a generative adversarial network,” in CVPR, 2016, pp. 105–114.
  27. “Toward real-world single image super-resolution: A new benchmark and a new model,” in ICCV, 2019, pp. 3086–3095.
  28. “Zoom to learn, learn to zoom,” in CVPR, 2019, pp. 3757–3765.
  29. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2014.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.