Two-stage generative adversarial networks for document image binarization with color noise and background removal (2010.10103v3)

Published 20 Oct 2020 in cs.CV, cs.AI, and cs.LG

Abstract: Document image enhancement and binarization methods are often used to improve the accuracy and efficiency of document image analysis tasks such as text recognition. Traditional non-machine-learning methods are constructed on low-level features in an unsupervised manner but have difficulty with binarization on documents with severely degraded backgrounds. Convolutional neural network-based methods focus only on grayscale images and on local textual features. In this paper, we propose a two-stage color document image enhancement and binarization method using generative adversarial neural networks. In the first stage, four color-independent adversarial networks are trained to extract color foreground information from an input image for document image enhancement. In the second stage, two independent adversarial networks with global and local features are trained for image binarization of documents of variable size. For the adversarial neural networks, we formulate loss functions between a discriminator and generators having an encoder-decoder structure. Experimental results show that the proposed method achieves better performance than many classical and state-of-the-art algorithms over the Document Image Binarization Contest (DIBCO) datasets, the LRDE Document Binarization Dataset (LRDE DBD), and our shipping label image dataset. We plan to release the shipping label dataset as well as our implementation code at github.com/opensuh/DocumentBinarization/.

Authors (4)

Sungho Suh (52 papers)
Jihun Kim (31 papers)
Paul Lukowicz (92 papers)
Yong Oh Lee (6 papers)

Citations (14)

View on Semantic Scholar

Summary

Two-Stage Generative Adversarial Networks for Document Image Binarization with Color Noise and Background Removal

This paper presents a novel approach to document image binarization through the employment of a two-stage framework leveraging Generative Adversarial Networks (GANs). Conventional methods in document image binarization, such as Otsu's or Sauvola's, often struggle to effectively distinguish text from highly degraded or color-varied backgrounds due to their reliance on local or grayscale features. The introduction of GANs into this domain highlights the potential of deep learning to handle complex degradations by discerning between document foreground and diverse background settings.

Methodology

The authors propose a comprehensive two-stage GAN framework designed to enhance and binarize document images characterized by color noise and complex backgrounds:

Stage One - Document Image Enhancement:
- This preliminary stage features a set of four independent GANs tailored to each primary color channel—red, green, blue—and a grayscale interpretation. Through adversarial learning, each network is trained to extract significant text features with minimized background interference from color document images. Each generator-discriminator pair is conditioned on its specific channel's ground truth, fostering robust enhancement capabilities.
Stage Two - Image Binarization:
- Following channel-specific enhancement, the second stage involves two additional adversarial networks for binarization, which incorporates local and global features. Local binarization derives from enhanced image patches output from the first stage, while global binarization employs the original full-size image. This stage leverages the holistic inclusion of context to reduce misclassification and refine text visibility.

The architecture notably employs loss functions that guide the training of generators to produce realistic and coherent text representations against the critical evaluations of adversarial discriminators.

Experimental Evaluation

The framework was validated across multiple datasets, including the various DIBCO challenges, LRDE Document Binarization Dataset, and a proprietary shipping label dataset intended for release on GitHub. The empirical results indicate a superior performance over traditional and state-of-the-art deep learning methods in terms of F-measure, pseudo-F-measure, PSNR, and DRD metrics. Particularly on complex datasets like DIBCO and shipping labels, the proposed method demonstrated substantial improvements in not only text discernment but also consequent OCR accuracy, verifying its practical implications.

Implications and Future Directions

The successful implementation of two-stage GANs in document binarization elucidates several key insights for researchers:

Integration of Color Channels: By independently training color channels, enhancements become adaptive to varied noise types, suggesting further exploration in multi-channel GAN architectures.
Adversarial Learning Sustained: The competitive setting between generator and discriminator networks proves beneficial beyond image generation, explicitly facilitating tasks like feature extraction and transformation in highly degraded environments.

Future work could consider the integration of text recognition pipelines within the binarization framework to form a seamless end-to-end solution for document processing. Additionally, expanding the dataset with diverse degraded scenarios could enhance the model’s robustness and application range. Enhanced GAN training techniques or architectures, such as the inclusion of reinforcement learning strategies, may potentially further improve performance metrics and efficiency.

PDF Markdown

Related Papers

GitHub

GitHub - opensuh/DocumentBinarization (53 stars)
GitHub - opensuh/DocumentBinarization (54 stars)