VATr++: Choose Your Words Wisely for Handwritten Text Generation (2402.10798v1)
Abstract: Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect - the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This study delves deeper into a cutting-edge Styled-HTG approach, proposing strategies for input preparation and training regularization that allow the model to achieve better performance and generalize better. These aspects are validated through extensive analysis on several different settings and datasets. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research - the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.
- A. K. Bhunia, S. Khan, H. Cholakkal, R. M. Anwer, F. S. Khan, and M. Shah, “Handwriting Transformers,” in ICCV, 2021.
- S. Fogel, H. Averbuch-Elor, S. Cohen, S. Mazor, and R. Litman, “ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation,” in CVPR, 2020.
- L. Kang, P. Riba, Y. Wang, M. Rusiñol, A. Fornés, and M. Villegas, “GANwriting: Content-Conditioned Generation of Styled Handwritten Word Images,” in ECCV, 2020.
- A. K. Bhunia, S. Ghose, A. Kumar, P. N. Chowdhury, A. Sain, and Y.-Z. Song, “MetaHTR: Towards Writer-Adaptive Handwritten Text Recognition,” in CVPR, 2021.
- A. K. Bhunia, A. Sain, P. N. Chowdhury, and Y.-Z. Song, “Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation,” in ICCV, 2021.
- Y. Zhang, S. Nie, W. Liu, X. Xu, D. Zhang, and H. T. Shen, “Sequence-to-sequence domain adaptation network for robust text image recognition,” in CVPR, 2019.
- A. K. Bhunia, A. Das, A. K. Bhunia, P. S. R. Kishore, and P. P. Roy, “Handwriting Recognition in Low-Resource Scripts Using Adversarial Learning,” in CVPR. IEEE, 2019.
- L. Kang, P. Riba, M. Rusinol, A. Fornes, and M. Villegas, “Content and style aware generation of text-line images for handwriting recognition,” IEEE Trans. PAMI, pp. 1–1, 2021.
- ——, “Distilling content from style for handwritten word recognition,” in ICFHR, 2020.
- V. Pippi, S. Cascianelli, L. Baraldi, and R. Cucchiara, “Evaluating Synthetic Pre-Training for Handwriting Processing Tasks,” Pattern Recognit. Lett., 2023.
- V. Pippi, S. Cascianelli, and R. Cucchiara, “Handwritten Text Generation from Visual Archetypes,” in CVPR, 2023.
- A. Graves, “Generating Sequences with Recurrent Neural Networks,” arXiv preprint arXiv:1308.0850, 2013.
- E. Aksan, F. Pece, and O. Hilliges, “DeepWriting: Making digital ink editable via deep generative modeling,” in CHI. ACM, 2018.
- E. Aksan and O. Hilliges, “STCN: Stochastic Temporal Convolutional Networks,” in ICLR, 2018.
- B. Ji and T. Chen, “Generative Adversarial Network for Handwritten Text,” arXiv preprint arXiv:1907.11845, 2019.
- A. Kotani, S. Tellex, and J. Tompkin, “Generating Handwriting via Decoupled Style Descriptors,” in ECCV, 2020.
- L. Vögtlin, M. Drazyk, V. Pondenkandath, M. Alberti, and R. Ingold, “Generating Synthetic Handwritten Historical Documents with OCR Constrained GANs,” in ICDAR, J. Lladós, D. Lopresti, and S. Uchida, Eds. Cham: Springer International Publishing, 2021, pp. 610–625.
- S. Cascianelli, M. Cornia, L. Baraldi, M. L. Piazzi, R. Schiuma, and R. Cucchiara, “Learning to Read L’Infinito: Handwritten Text Recognition with Synthetic Training Data,” in CAIP, 2021.
- J. Wang, C. Wu, Y.-Q. Xu, and H.-Y. Shum, “Combining Shape and Physical Models for On-line Cursive Handwriting Synthesis,” IJDAR, vol. 7, no. 4, pp. 219–227, 2005.
- Z. Lin and L. Wan, “Style-preserving english handwriting synthesis,” Pattern Recognit., vol. 40, no. 7, pp. 2097–2109, 2007.
- A. O. Thomas, A. Rusu, and V. Govindaraju, “Synthetic Handwritten CAPTCHAs,” Pattern Recognit., vol. 42, no. 12, pp. 3365–3373, 2009.
- T. Haines, O. Mac Aodha, and G. Brostow, “My Text in Your Handwriting,” ACM Trans. Graphics, vol. 35, no. 3, 2016.
- E. Alonso, B. Moysset, and R. Messina, “Adversarial Generation of Handwritten Text Images Conditioned on Sequences,” in ICDAR. IEEE Computer Society, 2019.
- B. Davis, C. Tensmeyer, B. Price, C. Wigington, B. Morse, and R. Jain, “Text and Style Conditioned GAN for Generation of Offline Handwriting Lines,” in BMVC, 2020.
- A. Mattick, M. Mayr, M. Seuret, A. Maier, and V. Christlein, “SmartPatch: Improving Handwritten Word Imitation with Patch Discriminators,” in ICDAR, 2021.
- J. Gan and W. Wang, “HiGAN: Handwriting Imitation Conditioned on Arbitrary-Length Texts and Disentangled Styles,” in AAAI, 2021.
- J. Gan, W. Wang, J. Leng, and X. Gao, “HiGAN+: Handwriting Imitation GAN with Disentangled Representations,” ACM Trans. Graphics, vol. 42, no. 1, pp. 1–17, 2022.
- P. Krishnan, R. Kovvuri, G. Pang, B. Vassilev, and T. Hassner, “TextStyleBrush: Transfer of Text Aesthetics from a Single Example,” arXiv e-prints, pp. arXiv–2106, 2021.
- C. Luo, Y. Zhu, L. Jin, Z. Li, and D. Peng, “SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and Out-of-Vocabulary Text,” IEEE Trans. Neural Netw. Learn. Syst., 2022.
- U.-V. Marti and H. Bunke, “The IAM-database: an English sentence database for offline handwriting recognition,” IJDAR, vol. 5, no. 1, pp. 39–46, 2002.
- E. Augustin, M. Carré, E. Grosicki, J.-M. Brodin, E. Geoffrois, and F. Prêteux, “RIMES evaluation campaign for handwritten mail processing,” in IWFHR, 2006.
- F. Kleber, S. Fiel, M. Diem, and R. Sablatnig, “CVL-DataBase: An Off-Line Database for Writer Retrieval, Writer Identification and Word Spotting,” in ICDAR, 2013.
- Y. Zhu, Z. Li, T. Wang, M. He, and C. Yao, “Conditional Text Image Generation with Diffusion Models,” in CVPR, 2023.
- K. Nikolaidou, G. Retsinas, V. Christlein, M. Seuret, G. Sfikas, E. B. Smith, H. Mokayed, and M. Liwicki, “WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models,” in ICDAR, 2023.
- I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, “Generative Adversarial Nets,” in NeurIPS, 2014.
- M. Mirza and S. Osindero, “Conditional Generative Adversarial Nets,” arXiv preprint arXiv:1411.1784, 2014.
- M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium,” in NeurIPS, 2017.
- S. Azadi, M. Fisher, V. Kim, Z. Wang, E. Shechtman, and T. Darrell, “Multi-content GAN for Few-Shot Font Style Transfer,” in CVPR. IEEE, 2018.
- J. Cha, S. Chun, G. Lee, B. Lee, S. Kim, and H. Lee, “Few-Shot Compositional Font Generation with Dual Memory,” in ECCV, 2020.
- S. Park, S. Chun, J. Cha, B. Lee, and H. Shim, “Few-shot Font Generation with Localized Style Representations and Factorization,” in AAAI, vol. 35, 2021.
- Y. Xie, X. Chen, L. Sun, and Y. Lu, “DG-Font: Deformable Generative Networks for Unsupervised Font Generation,” in CVPR, 2021.
- J.-S. Lee, R.-H. Baek, and H.-C. Choi, “Arbitrary font generation by encoder learning of disentangled features,” Sensors, vol. 22, no. 6, p. 2374, 2022.
- B. Chang, Q. Zhang, S. Pan, and L. Meng, “Generating Handwritten Chinese Characters Using CycleGAN,” in WACV. IEEE, 2018.
- Y. Gao, Y. Guo, Z. Lian, Y. Tang, and J. Xiao, “Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning,” ACM Trans. Graphics, vol. 38, no. 6, 2019.
- Y. Jiang, Z. Lian, Y. Tang, and J. Xiao, “SCFont: Structure-Guided Chinese Font Generation via Deep Stacked Networks,” in AAAI, 2019.
- S. Yuan, R. Liu, M. Chen, B. Chen, Z. Qiu, and X. He, “SE-GAN: Skeleton Enhanced GAN-based Model for Brush Handwriting Font Generation,” arXiv preprint arXiv:2204.10484, 2022.
- Y. Wang, H. Wang, S. Sun, and H. Wei, “An Approach Based on Transformer and Deformable Convolution for Realistic Handwriting Samples Generation,” in ICPR. IEEE, 2022, pp. 1457–1463.
- H. Wang, Y. Wang, and H. Wei, “AFFGANwriting: A Handwriting Image Generation Method Based on Multi-feature Fusion,” in ICDAR. Springer, 2023, pp. 302–312.
- J. Zdenek and H. Nakayama, “Jokergan: Memory-efficient model for handwritten text generation with text line awareness,” in ACM Multimedia, 2021, pp. 5655–5663.
- Y. Kong, C. Luo, W. Ma, Q. Zhu, S. Zhu, N. Yuan, and L. Jin, “Look closer to supervise better: one-shot font generation via component-based discriminator,” in CVPR, 2022, pp. 13 482–13 491.
- J. Zdenek and H. Nakayama, “Handwritten Text Generation with Character-Specific Encoding for Style Imitation,” in ICDAR. Springer, 2023, pp. 313–329.
- V. Khrulkov and I. Oseledets, “Geometry Score: A Method For Comparing Generative Adversarial Networks,” in ICML. PMLR, 2018.
- M. Bińkowski, D. J. Sutherland, M. Arbel, and A. Gretton, “Demystifying MMD GANs,” arXiv preprint arXiv:1801.01401, 2018.
- V. Pippi, F. Quattrini, S. Cascianelli, and R. Cucchiara, “HWD: A Novel Evaluation Score for Styled Handwritten Text Generation,” in BMVC, 2023.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in CVPR, 2016.
- M. Javidi and M. Jampour, “A deep learning framework for text-independent writer identification,” Eng. Appl. Artif. Intell., vol. 95, p. 103912, 2020.
- Y. Zhu, S. Lai, Z. Li, and L. Jin, “Point-to-Set Similarity Based Deep Metric Learning for Offline Signature Verification,” in ICFHR, 2020.
- S. Manna, S. Chattopadhyay, S. Bhattacharya, and U. Pal, “SWIS: Self-Supervised Representation Learning For Writer Independent Offline Signature Verification,” arXiv preprint arXiv:2202.13078, 2022.
- J. H. Lim and J. C. Ye, “Geometric GAN,” arXiv preprint arXiv:1705.02894, 2017.
- B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Trans. PAMI, vol. 39, no. 11, pp. 2298–2304, 2016.
- T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila, “Training Generative Adversarial Networks with Limited Data,” in NeurIPS, 2020.
- A. Brock, J. Donahue, and K. Simonyan, “Large Scale GAN Training for High Fidelity Natural Image Synthesis,” in ICLR, 2019.
- M. Li, T. Lv, L. Cui, Y. Lu, D. Florencio, C. Zhang, Z. Li, and F. Wei, “TrOCR: Transformer-based optical character recognition with pre-trained models,” arXiv preprint arXiv:2109.10282, 2021.