Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MENTOR: Multilingual tExt detectioN TOward leaRning by analogy (2403.07286v1)

Published 12 Mar 2024 in cs.CV

Abstract: Text detection is frequently used in vision-based mobile robots when they need to interpret texts in their surroundings to perform a given task. For instance, delivery robots in multilingual cities need to be capable of doing multilingual text detection so that the robots can read traffic signs and road markings. Moreover, the target languages change from region to region, implying the need of efficiently re-training the models to recognize the novel/new languages. However, collecting and labeling training data for novel languages are cumbersome, and the efforts to re-train an existing/trained text detector are considerable. Even worse, such a routine would repeat whenever a novel language appears. This motivates us to propose a new problem setting for tackling the aforementioned challenges in a more efficient way: "We ask for a generalizable multilingual text detection framework to detect and identify both seen and unseen language regions inside scene images without the requirement of collecting supervised training data for unseen languages as well as model re-training". To this end, we propose "MENTOR", the first work to realize a learning strategy between zero-shot learning and few-shot learning for multilingual scene text detection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Character region awareness for text detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  2. E2e-mlt - an unconstrained end-to-end method for multi-language scene text. ArXiv:1801.09919, 2018.
  3. A direct regression scene text detector with position-sensitive segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 2020.
  4. Progressive contour regression for arbitrary-shape scene text detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  5. Deep residual learning for image recognition. ArXiv:1512.03385, 2015.
  6. Most: A multi-oriented scene text detector with localization refinement. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  7. A multiplexed network for end-to-end, multilingual ocr. ArXiv:2103.15992, 2021.
  8. Mask r-cnn with pyramid attention network for scene text detection. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2019.
  9. Learning open-world object proposals without learning to classify. IEEE Robotics and Automation Letters (RAL), 2022.
  10. Siamese neural networks for one-shot image recognition. In International Conference on Machine Learning (ICML) Workshops, 2015.
  11. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014.
  12. Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In European Conference on Computer Vision (ECCV), 2020.
  13. Feature pyramid networks for object detection. ArXiv:1612.03144, 2016.
  14. Fots: Fast oriented text spotting with a unified network. ArXiv:1801.01671, 2018.
  15. An accurate segmentation-based scene text detector with context attention and repulsive text border. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020.
  16. Abcnet: Real-time scene text spotting with adaptive bezier-curve network. ArXiv:2002.10200, 2020.
  17. Benchmarking scene text recognition in devanagari, telugu and malayalam. In IAPR International Conference on Document Analysis and Recognition (ICDAR) Workshops, 2017.
  18. Icdar2019 robust reading challenge on multi-lingual scene text detection and script identification - rrc-mlt. In IAPR International Conference on Document Analysis and Recognition (ICDAR), 2019.
  19. Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification - rrc-mlt. In IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017.
  20. Prototypical networks for few-shot learning. ArXiv:1703.05175, 2017.
  21. Learning to compare: Relation network for few-shot learning. ArXiv:1711.06025, 2017.
  22. Fcos: Fully convolutional one-stage object detection. ArXiv:1904.01355, 2019.
  23. Bala R. Vatti. A generic solution to polygon clipping. Communications of the ACM, 1992.
  24. Matching networks for one shot learning. ArXiv:1606.04080, 2016.
  25. Shape robust text detection with progressive scale expansion network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  26. On exploring and improving robustness of scene text detection models, 2021.
  27. Sequential deformation for accurate scene text detection. In European Conference on Computer Vision (ECCV), 2020.
  28. Scene text detection with supervised pyramid context network. ArXiv:1811.08605, 2018.
  29. Geometry normalization networks for accurate scene text detection. In IEEE International Conference on Computer Vision (ICCV), 2019.
  30. Look more than once: An accurate detector for text of arbitrary shapes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  31. An anchor-free region proposal network for faster r-cnn based text detection approaches. ArXiv:1804.09003, 2018.

Summary

We haven't generated a summary for this paper yet.