GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System (2404.14062v1)
Abstract: The Handwritten Text Recognition problem has been a challenge for researchers for the last few decades, especially in the domain of computer vision, a subdomain of pattern recognition. Variability of texts amongst writers, cursiveness, and different font styles of handwritten texts with degradation of historical text images make it a challenging problem. Recognizing scanned document images in neural network-based systems typically involves a two-step approach: segmentation and recognition. However, this method has several drawbacks. These shortcomings encompass challenges in identifying text regions, analyzing layout diversity within pages, and establishing accurate ground truth segmentation. Consequently, these processes are prone to errors, leading to bottlenecks in achieving high recognition accuracies. Thus, in this study, we present an end-to-end paragraph recognition system that incorporates internal line segmentation and gated convolutional layers based encoder. The gating is a mechanism that controls the flow of information and allows to adaptively selection of the more relevant features in handwritten text recognition models. The attention module plays an important role in performing internal line segmentation, allowing the page to be processed line-by-line. During the decoding step, we have integrated a connectionist temporal classification-based word beam search decoder as a post-processing step. In this work, we have extended existing LexiconNet by carefully applying and utilizing gated convolutional layers in the existing deep neural network. Our results at line and page levels also favour our new GatedLexiconNet. This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on IAM, 2.76% on RIMES, and 6.52% on READ-2016 datasets.
- R. Plamondon and S.N. Srihari. Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):63–84, 2000. doi:10.1109/34.824821.
- The state of the art in online handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(8):787–808, 1990. doi:10.1109/34.57669.
- Offline cursive handwriting recognition system based on hybrid markov model and neural networks. In Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694), volume 3, pages 1190–1195 vol.3, 2003. doi:10.1109/CIRA.2003.1222166.
- An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11):2298–2304, 2017. doi:10.1109/TPAMI.2016.2646371.
- Htr-flor: A deep learning system for offline handwritten text recognition. In 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 54–61, 2020. doi:10.1109/SIBGRAPI51738.2020.00016.
- Word beam search: A connectionist temporal classification decoding algorithm. In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pages 253–258, 2018. doi:10.1109/ICFHR-2018.2018.00052.
- End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2022. doi:10.1109/TPAMI.2022.3144899.
- External word segmentation of off-line handwritten text lines. Pattern Recognition, 27(1):41–52, January 1994. doi:10.1016/0031-3203(94)90016-7.
- Recognition of handwritten word: first and second order hidden markov model based approach. In Proceedings CVPR ’88: The Computer Society Conference on Computer Vision and Pattern Recognition, pages 457–462, 1988. doi:10.1109/CVPR.1988.196275.
- Hidden markov model based word recognition and its application to legal amount reading on french checks. Computer Vision and Image Understanding, 70(3):404–419, 1998. ISSN 1077-3142. doi:https://doi.org/10.1006/cviu.1998.0685.
- An off-line cursive handwriting recognition system. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):309–321, 1998. doi:10.1109/34.667887.
- Lexicon-driven handwritten word recognition using optimal linear combinations of order statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(1):77–82, 1999. doi:10.1109/34.745738.
- Applied dynamic programming, volume 2050. Princeton university press, 2015.
- Jaehwa Park. An adaptive approach to offline handwritten word recognition. IEEE Trans. Pattern Anal. Mach. Intell., 24, 2002.
- Offline recognition of unconstrained handwritten texts using hmms and statistical language models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):709–720, 2004. doi:10.1109/TPAMI.2004.14.
- Text line and word segmentation of handwritten documents. Pattern Recognition, 42(12):3169–3183, 2009. doi:https://doi.org/10.1016/j.patcog.2008.12.016.
- Word spotting and recognition with embedded attributes. IEEE transactions on pattern analysis and machine intelligence, 36(12):2552–2566, 2014.
- Offline grammar-based recognition of handwritten sentences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5):818–821, 2006. doi:10.1109/TPAMI.2006.103.
- Markov models for offline handwriting recognition: A survey. IJDAR, 12:269–298, 12 2009. doi:10.1007/s10032-009-0098-4.
- Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(4):767–779, 2011. doi:10.1109/TPAMI.2010.141.
- A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5):855–868, 2009. doi:10.1109/TPAMI.2008.137.
- Full-page text recognition: Learning where to start and when to stop. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 01, pages 871–876, 2017. doi:10.1109/ICDAR.2017.147.
- Data augmentation for recognition of handwritten words and lines using a cnn-lstm network. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pages 639–645, 2017. doi:10.1109/ICDAR.2017.110.
- Improving neural networks by preventing co-adaptation of feature detectors, 2012.
- Dropout improves recurrent neural networks for handwriting recognition, 2014.
- Simultaneous script identification and handwriting recognition via multi-task learning of recurrent neural networks. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 01, pages 525–530, 2017. doi:10.1109/ICDAR.2017.92.
- H Scheidl. Handwritten text recognition in historical document. diplom-Ingenieur in Visual Computing, Master’s thesis, Technische Universität Wien, Vienna, 2018.
- A lexicon and depth-wise separable convolution based handwritten text recognition system. In Image and Vision Computing, pages 442–456. Springer Nature Switzerland, 2023a.
- Candidate fusion: Integrating language modelling into a sequence-to-sequence handwritten word recognition architecture, 2019.
- Neural network language models for off-line handwriting recognition. Pattern Recognition, 2014. doi:10.1016/j.patcog.2013.10.020.
- Handwritten document image segmentation into text lines and words. Pattern Recognition, 43(1):369–377, 2010. doi:https://doi.org/10.1016/j.patcog.2009.05.007.
- R. Manmatha and Nitin Srimal. Scale space technique for word segmentation in handwritten documents. In Proceedings of the Second International Conference on Scale-Space Theories in Computer Vision, page 22–33, Berlin, Heidelberg, 1999. Springer-Verlag. ISBN 354066498X.
- R. Manmatha and J.L. Rothfeder. A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1212–1225, 2005. doi:10.1109/TPAMI.2005.150.
- A statistical approach to line segmentation in handwritten documents - art. no. 65000t. Proceedings of SPIE - The International Society for Optical Engineering, 6500, 01 2007. doi:10.1117/12.704538.
- Page level input for handwritten text recognition in document images. In Proceedings of 7th International Conference on Harmony Search, Soft Computing and Applications, pages 171–183, 2022.
- Toward integrated scene text reading. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2):375–387, 2014. doi:10.1109/TPAMI.2013.126.
- Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 06 2015. doi:10.1109/TPAMI.2014.2366765.
- Lerec: A nn/hmm hybrid for on-line handwriting recognition. Neural computation, 7:1289–303, 12 1995. doi:10.1162/neco.1995.7.6.1289.
- Scan, attend and read: End-to-end handwritten paragraph recognition with mdlstm attention. pages 1050–1055, 11 2017. doi:10.1109/ICDAR.2017.174.
- Gated convolutional recurrent neural networks for multilingual handwriting recognition. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 01, pages 646–651, 2017. doi:10.1109/ICDAR.2017.111.
- Joan Puigcerver. Are multidimensional recurrent layers really necessary for handwritten text recognition? In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 01, pages 67–72, 2017. doi:10.1109/ICDAR.2017.20.
- Multi-dimensional connectionist classification: Reading text in one step. In 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pages 405–410, 2018. doi:10.1109/DAS.2018.36.
- Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recognition, 108, 2020. doi:https://doi.org/10.1016/j.patcog.2020.107482.
- Dan: A segmentation-free document attention network for handwritten document recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(07):8227–8243, 2023. ISSN 1939-3539. doi:10.1109/TPAMI.2023.3235826.
- The IAM-database: an english sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5(1):39–46, 2002.
- Icfhr2016 competition on handwritten text recognition on the read dataset. In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pages 630–635, 2016. doi:10.1109/ICFHR.2016.0120.
- Icdar 2011 - french handwriting recognition competition. In 2011 International Conference on Document Analysis and Recognition, pages 1459–1463, 2011. doi:10.1109/ICDAR.2011.290.
- F. Chollet. Xception: Deep learning with depthwise separable convolutions. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1800–1807, 2017. doi:10.1109/CVPR.2017.195. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.195.
- Neural machine translation by jointly learning to align and translate, 2014.
- Origaminet: Weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold, 2020. URL https://arxiv.org/abs/2006.07491.
- Théodore Bluche. Joint line segmentation and transcription for end-to-end handwritten paragraph recognition, 2016.
- Start, follow, read: End-to-end full-page handwriting recognition. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, Computer Vision – ECCV 2018, pages 372–388, Cham, 2018. Springer International Publishing. ISBN 978-3-030-01231-1.
- A comprehensive handwritten paragraph text recognition system: Lexiconnet. In Document Analysis and Recognition – ICDAR 2023 Workshops, pages 226–241, 2023b.
- Lalita Kumari (4 papers)
- Sukhdeep Singh (52 papers)
- Vaibhav Varish Singh Rathore (2 papers)
- Anuj Sharma (63 papers)