Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Combining Morphological and Histogram based Text Line Segmentation in the OCR Context (2103.08922v4)

Published 16 Mar 2021 in cs.CV

Abstract: Text line segmentation is one of the pre-stages of modern optical character recognition systems. The algorithmic approach proposed by this paper has been designed for this exact purpose. Its main characteristic is the combination of two different techniques, morphological image operations and horizontal histogram projections. The method was developed to be applied on a historic data collection that commonly features quality issues, such as degraded paper, blurred text, or presence of noise. For that reason, the segmenter in question could be of particular interest for cultural institutions, that want access to robust line bounding boxes for a given historic document. Because of the promising segmentation results that are joined by low computational cost, the algorithm was incorporated into the OCR pipeline of the National Library of Luxembourg, in the context of the initiative of reprocessing their historic newspaper collection. The general contribution of this paper is to outline the approach and to evaluate the gains in terms of accuracy and speed, comparing it to the segmentation algorithm bundled with the used open source OCR software.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. Text line segmentation for challenging handwritten document images using fully convolutional network. pages 374–379, 2018.
  2. G. Bradski. The opencv library. Dr. Dobb’s Journal of Software Tools, 25(11):120–123, 2000.
  3. T. Breuel. The ocropus open source ocr system. 2008.
  4. Text line segmentation based on morphology and histogram projection. pages 651–655, 2009.
  5. B. Kiessling. Kraken - a universal text recognizer for the humanities. URL http://kraken.re. [Accessed Oct. 29, 2021].
  6. Learning-free text line segmentation for historical handwritten documents. Applied Sciences, 10(22), 2020.
  7. Text line segmentation of historical documents: a survey. CoRR, abs/0704.1267:123–138, 2007.
  8. Text line segmentation in historical document images using an adaptive u-net architecture. pages 369–374, 2019.
  9. N. Otsu. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics, pages 62–66, 1979.
  10. Projection–based text line segmentation with a variable threshold. International Journal of Applied Mathematics and Computer Science, 27:195–206, 2017.
  11. Handwritten text line segmentation using fully convolutional network. pages 5–9, 2017.
  12. Scipy 1.0: Fundamental algorithms for scientific computing in python. Nature Methods, 17:261–272, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.