Improving Handwritten Text Recognition via 3D Attention and Multi-Scale Training (2410.18374v2)
Abstract: The segmentation-free research efforts for addressing handwritten text recognition can be divided into three categories: connectionist temporal classification (CTC), hidden Markov model and encoder-decoder methods. In this paper, inspired by the above three modeling methods, we propose a new recognition network by using a novel three-dimensional (3D) attention module and global-local context information. Based on the feature maps of the last convolutional layer, a series of 3D blocks with different resolutions are split. Then, these 3D blocks are fed into the 3D attention module to generate sequential visual features. Finally, by integrating the visual features and the corresponding global-local context features, a well-designed representation can be obtained. Main canonical neural units including attention mechanisms, fully-connected layer, recurrent unit and convolutional layer are efficiently organized into a network and can be jointly trained by the CTC loss and the cross-entropy loss. Experiments on the latest Chinese handwritten text datasets (the SCUT-HCCDoc and the SCUT-EPT) and one English handwritten text dataset (the IAM) show that the proposed method can achieve comparable results with the state-of-the-art methods. The code is available at https://github.com/Wukong90/3DAttention-MultiScaleTraining-for-HTR.
- Handwritten chinese text recognition by integrating multiple contexts. IEEE transactions on pattern analysis and machine intelligence, 34(8):1469–1481, 2011.
- Recognition of handwritten chinese text by segmentation: A segment-annotation-free approach. IEEE Transactions on Multimedia, 2022.
- Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE transactions on pattern analysis and machine intelligence, 33(4):767–779, 2010.
- A novel connectionist system for unconstrained handwriting recognition. IEEE transactions on pattern analysis and machine intelligence, 31(5):855–868, 2008.
- Segmentation-free handwritten chinese text recognition with lstm-rnn. In 2015 13th International conference on document analysis and recognition (icdar), pages 171–175. IEEE, 2015.
- Writer-aware cnn for parsimonious hmm-based offline handwritten chinese text recognition. Pattern Recognition, 100:107102, 2020.
- Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning, pages 369–376, 2006.
- Improving cnn-rnn hybrid networks for handwriting recognition. In 2018 16th international conference on frontiers in handwriting recognition (ICFHR), pages 80–85. IEEE, 2018.
- lodenet: a holistic approach to offline handwritten chinese and japanese text line recognition. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 4813–4820. IEEE, 2021.
- Fast writer adaptation with style extractor network for handwritten text recognition. Neural Networks, 147:42–52, 2022.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- An efficient end-to-end neural model for handwritten text recognition. arXiv preprint arXiv:1807.07965, 2018.
- Radical analysis network for learning hierarchies of chinese characters. Pattern Recognition, 103:107305, 2020.
- Recurrent neural network transducer for japanese and chinese offline handwritten text recognition. In International Conference on Document Analysis and Recognition, pages 364–376. Springer, 2021.
- An open-source library of 2d-gmm-hmm based on kaldi toolkit and its application to handwritten chinese character recognition. In Image and Graphics: 11th International Conference, ICIG 2021, Haikou, China, August 6–8, 2021, Proceedings, Part I 11, pages 235–244. Springer, 2021.
- Offline handwriting recognition with multidimensional recurrent neural networks. Advances in neural information processing systems, 21, 2008.
- Joan Puigcerver. Are multidimensional recurrent layers really necessary for handwritten text recognition? In 2017 14th IAPR international conference on document analysis and recognition (ICDAR), volume 1, pages 67–72. IEEE, 2017.
- An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 39(11):2298–2304, 2016.
- Self-supervised character-to-character distillation for text recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19473–19484, 2023.
- Chinese text recognition with a pre-trained clip-like model through image-ids aligning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11943–11952, 2023.
- Relational contrastive learning for scene text recognition. In Proceedings of the 31st ACM International Conference on Multimedia, pages 5764–5775, 2023.
- Mrn: Multiplexed routing network for incremental multilingual text recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18644–18653, 2023.
- Focusing attention: Towards accurate text recognition in natural images. In Proceedings of the IEEE international conference on computer vision, pages 5076–5084, 2017.
- Aster: An attentional scene text recognizer with flexible rectification. IEEE transactions on pattern analysis and machine intelligence, 41(9):2035–2048, 2018.
- From two to one: A new scene text recognizer with visual language modeling network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14194–14203, 2021.
- Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7098–7107, 2021.
- Pure transformer with integrated experts for scene text recognition. In European Conference on Computer Vision, pages 481–497. Springer, 2022.
- Building a mobile text recognizer via truncated svd-based knowledge distillation-guided nas. 2023.
- Pagenet: Towards end-to-end weakly supervised page-level handwritten chinese text recognition. International Journal of Computer Vision, 130(11):2623–2645, 2022.
- Dan: a segmentation-free document attention network for handwritten document recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
- Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
- Simam: A simple, parameter-free attention module for convolutional neural networks. In International conference on machine learning, pages 11863–11874. PMLR, 2021.
- Master: Multi-aspect non-local network for scene text recognition. Pattern Recognition, 117:107980, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- A model of stroke extraction from chinese character images. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, volume 4, pages 368–371. IEEE, 2000.
- End-to-end scene text recognition. In 2011 International conference on computer vision, pages 1457–1464. IEEE, 2011.
- End-to-end text recognition with convolutional neural networks. In Proceedings of the 21st international conference on pattern recognition (ICPR2012), pages 3304–3308. IEEE, 2012.
- Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13528–13537, 2020.
- On recognizing texts of arbitrary shapes with 2d self-attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 546–547, 2020.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Scut-ept: New dataset and benchmark for offline chinese text recognition in examination paper. IEEE Access, 7:370–382, 2018.
- Scut-hccdoc: A new benchmark dataset of handwritten chinese text in unconstrained camera-captured documents. Pattern Recognition, 108:107559, 2020.
- The iam-database: an english sentence database for offline handwriting recognition. International journal on document analysis and recognition, 5:39–46, 2002.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.