DeepOtsu: Document Enhancement and Binarization using Iterative Deep Learning (1901.06081v1)

Published 18 Jan 2019 in cs.CV

Abstract: This paper presents a novel iterative deep learning framework and apply it for document enhancement and binarization. Unlike the traditional methods which predict the binary label of each pixel on the input image, we train the neural network to learn the degradations in document images and produce the uniform images of the degraded input images, which allows the network to refine the output iteratively. Two different iterative methods have been studied in this paper: recurrent refinement (RR) which uses the same trained neural network in each iteration for document enhancement and stacked refinement (SR) which uses a stack of different neural networks for iterative output refinement. Given the learned uniform and enhanced image, the binarization map can be easy to obtain by a global or local threshold. The experimental results on several public benchmark data sets show that our proposed methods provide a new clean version of the degraded image which is suitable for visualization and promising results of binarization using the global Otsu's threshold based on the enhanced images learned iteratively by the neural network.

Citations (677)

View on Semantic Scholar

Summary

The paper presents an iterative framework using recurrent and stacked CNN refinements to enhance degraded document images for improved binarization.
It demonstrates that preprocessing with iterative enhancement increases F-measure scores from 80.01% to 90.00% on benchmark datasets.
The novel approach improves readability and transcription accuracy for historical documents, benefiting digital archivists and scholars.

Overview of "DeepOtsu: Document Enhancement and Binarization using Iterative Deep Learning"

Sheng He and Lambert Schomaker's paper presents a detailed examination into the improvement of document binarization through the introduction of an iterative deep learning framework, "DeepOtsu". The researchers address challenges associated with the binarization of degraded historical documents, which often suffer from various artifacts and noise, complicating the extraction of clean, readable text.

Unlike traditional binarization methods that attempt to directly predict the binary label of each pixel, the proposed method focuses on first enhancing the degraded image by learning and correcting its degradations iteratively through neural networks. This preprocessing enhancement step aids in achieving higher quality binarized outputs.

Iterative Enhancement Through CNNs

The paper explores two distinct iterative methods:

Recurrent Refinement (RR): Utilizes the same trained neural network iteratively for enhancing an input document image.
Stacked Refinement (SR): Deploys a stack of different neural networks, refining the input iteratively.

These methods differ fundamentally from traditional approaches by focusing on learning the degradation patterns in images. The neural network iteratively transforms the input into a uniform and enhanced representation, which simplifies the binarization process using global or local thresholds.

Experimental Validation

The authors validate their methods on several public benchmark datasets, including those from the DIBCO series, and a newly introduced Monk Cuper Set (MCS). Notable findings are:

Performance on DIBCO 2013: Using Otsu's threshold on images enhanced by SR, there was an observed increase in F-measure from 80.01% (on original images) to 90.00%.
Performance on MCS: The SR-enhanced images showed an F-measure improvement of 13.49% over original images when binarized with Otsu.

Implications and Future Work

Practically, the DeepOtsu method provides substantial benefits for digital archivists, historians, and other scholars involved in the transcription and analysis of historical documents. The improved enhancement and subsequent binarization results enable better visualization and readability, which are crucial for accurate information extraction.

Theoretically, the iterative deep learning framework opens new avenues for document image processing, emphasizing degradation learning, and separation of enhancement from binarization tasks. Future developments in AI could explore more advanced network architectures like ResNet or DenseNet applied within this iterative framework to further enhance performance and handle more complex degradations. Additionally, adapting this model to work with larger patch sizes could allow for more context-aware enhancements, catering to documents with extensive smear or large artifacts.

Conclusion

He and Schomaker's work exemplifies a novel application of iterative deep learning to the problem of document binarization. By focusing on iterative refinement and separating enhancement from binarization, they introduce a method that significantly improves the visual quality and accuracy of binarized outputs, especially in highly degraded historical documents. This research lays foundational work that future studies can build upon to further advance the field of document image processing.

PDF Markdown