2000 character limit reached
DiffusionSTR: Diffusion Model for Scene Text Recognition (2306.16707v1)
Published 29 Jun 2023 in cs.CV
Abstract: This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition task as an image-to-text transformation, we rethought it as a text-text one under images in a diffusion model. We show for the first time that the diffusion model can be applied to text recognition. Furthermore, experimental results on publicly available datasets show that the proposed method achieves competitive accuracy compared to state-of-the-art methods.
- “Temporally-aware convolutional block attention module for video text detection,” in IEEE SMC, 2021, pp. 220–225.
- Masato Fujitake, “A3s: Adversarial learning of semantic representations for scene-text spotting,” in ICASSP, 2023, pp. 1–5.
- “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” TPAMI, vol. 39, no. 11, pp. 2298–2304, 2016.
- “What if we only use real datasets for scene text recognition? toward scene text recognition with fewer labels,” in CVPR, 2021, pp. 3113–3122.
- “Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition,” in CVPR, 2021, pp. 7098–7107.
- “Scene text recognition with permuted autoregressive sequence models,” in ECCV, 2022, pp. 178–196.
- “Denoising diffusion probabilistic models,” in NeurIPS, 2020, vol. 33, pp. 6840–6851.
- “Argmax flows and multinomial diffusion: Learning categorical distributions,” in NeurIPS, 2021, vol. 34, pp. 12454–12465.
- “Gtc: Guided training of ctc towards efficient and accurate scene text recognition,” in AAAI, 2020, vol. 34, pp. 11005–11012.
- “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in ICML, 2006, pp. 369–376.
- “Textscanner: Reading characters in order for robust scene text recognition,” in AAAI, 2020, vol. 34, pp. 12120–12127.
- “Synthetic data and artificial neural networks for natural scene text recognition,” in NIPS Workshop, 2014.
- “Aster: An attentional scene text recognizer with flexible rectification,” TPAMI, vol. 41, no. 9, pp. 2035–2048, 2018.
- “Deep unsupervised learning using nonequilibrium thermodynamics,” in ICML. PMLR, 2015, pp. 2256–2265.
- “Hierarchical text-conditional image generation with clip latents,” arXiv preprint arXiv:2204.06125, 2022.
- “Diffusiondet: Diffusion model for object detection,” arXiv preprint arXiv:2211.09788, 2022.
- “Diffwave: A versatile diffusion model for audio synthesis,” in ICLR, 2021.
- “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2020.
- “Attention is all you need,” in NIPS, 2017, pp. 5998–6008.
- “Towards accurate scene text recognition with semantic reasoning networks,” in CVPR, 2020, pp. 12113–12122.
- “Reading text in the wild with convolutional neural networks,” IJCV, vol. 116, pp. 1–20, 2016.
- “Synthetic data for text localisation in natural images,” in CVPR, 2016, pp. 2315–2324.
- “Icdar 2013 robust reading competition,” in ICDAR, 2013, pp. 1484–1493.
- “Icdar 2015 competition on robust reading,” in ICDAR, 2015, pp. 1156–1160.
- “Scene text recognition using higher order language priors,” in BMVC, 2012.
- “End-to-end scene text recognition,” in ICCV, 2011, pp. 1457–1464.
- “Recognizing text with perspective distortion in natural scenes,” in ICCV, 2013, pp. 569–576.
- “A robust arbitrary text detection system for natural scene images,” Expert Systems with Applications, vol. 41, no. 18, pp. 8027–8048, 2014.
- “Decoupled weight decay regularization,” in ICLR, 2018, pp. 1–10.
- Rowel Atienza, “Vision transformer for fast and efficient scene text recognition,” in ICDAR, 2021, pp. 319–334.