CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents (2004.12629v2)

Published 27 Apr 2020 in cs.CV

Abstract: An automatic table recognition method for interpretation of tabular data in document images majorly involves solving two problems of table detection and table structure recognition. The prior work involved solving both problems independently using two separate approaches. More recent works signify the use of deep learning-based solutions while also attempting to design an end to end solution. In this paper, we present an improved deep learning-based end to end approach for solving both problems of table detection and structure recognition using a single Convolution Neural Network (CNN) model. We propose CascadeTabNet: a Cascade mask Region-based CNN High-Resolution Network (Cascade mask R-CNN HRNet) based model that detects the regions of tables and recognizes the structural body cells from the detected tables at the same time. We evaluate our results on ICDAR 2013, ICDAR 2019 and TableBank public datasets. We achieved 3rd rank in ICDAR 2019 post-competition results for table detection while attaining the best accuracy results for the ICDAR 2013 and TableBank dataset. We also attain the highest accuracy results on the ICDAR 2019 table structure recognition dataset. Additionally, we demonstrate effective transfer learning and image augmentation techniques that enable CNNs to achieve very accurate table detection results. Code and dataset has been made available at: https://github.com/DevashishPrasad/CascadeTabNet

Authors (5)

Devashish Prasad (2 papers)
Ayan Gadpal (2 papers)
Kshitij Kapadni (2 papers)
Manish Visave (2 papers)
Kavita Sultanpure (3 papers)

Citations (148)

View on Semantic Scholar

Summary

CascadeTabNet: An Expert Analysis

The paper "CascadeTabNet: An approach for end-to-end table detection and structure recognition from image-based documents" introduces an innovative deep learning-based technique for automatic table recognition in digitized documents. The work addresses the classical problems of table detection and table structure recognition using a unified approach leveraging a Convolutional Neural Network (CNN).

Problem Definition

The challenge of extracting structural information from digitized tables involves two main tasks: identifying the table region and recognizing the structure within the table, such as rows and columns. Previous methodologies often tackled these tasks separately, resulting in increased complexity and inefficiency. The presented research integrates these processes using a single CNN model, thereby simplifying the workflow and potentially enhancing accuracy and speed.

Methodology

The authors propose CascadeTabNet, an integrated model that utilizes the advanced framework of Cascade mask R-CNN with a High-Resolution Network (HRNet) backbone. This model excels in detecting table regions and recognizing the arrangement of table cells concurrently. It performs instance segmentation to predict table and cell regions, classifying tables as bordered or borderless, where special segmentation is applied only to borderless tables.

Key Features:

Image Segmentation: Implements pixel-level segmentation, predicting both tables and internal cell structures during a single inference run.
Classification: Distinguishes between bordered and borderless tables, utilizing rule-based algorithms for bordered tables to enhance efficiency.

Transfer Learning and Image Augmentation

A major contribution of the research is the application of iterative transfer learning and novel image augmentation techniques. The authors demonstrate that a CNN can achieve improved accuracy with limited data by fine-tuning pre-trained models and using dilation and smudge transformations to effectively enhance training datasets.

Evaluation and Results

Evaluations on datasets such as ICDAR 2013, ICDAR 2019, and TableBank reveal superior performance of CascadeTabNet. On the ICDAR 2019 dataset for table detection, the approach secures the third rank in post-competition evaluations and achieves the highest accuracy for structure recognition. Additionally, on the TableBank dataset, the model surpasses existing results with remarkable precision and recall metrics.

Notable Achievements:

Achieved the best accuracy on ICDAR 2013 and all subsets of the TableBank dataset.
Demonstrated effective adaptation across various datasets with minimal training data through iterative transfer learning.

Implications and Future Directions

The unified approach of CascadeTabNet provides a comprehensive solution to table recognition tasks, potentially simplifying workflows in document digitization and analysis. The integration of image augmentation and transfer learning emphasizes the ability to maximize performance with limited resources, a significant consideration for real-world applications.

Future research might explore:

Enhancements in post-processing to further refine detection and structure recognition accuracy.
Adaptation of the model to other domains where table-like structures are prevalent, thereby extending its applicability.

The paper contributes a robust and versatile tool to the field of document analysis, with practical implications for industries reliant on automated data extraction. The methodological improvements suggest further scope for enhancing AI-driven document processing models.

PDF Markdown

Related Papers

GitHub

GitHub - DevashishPrasad/CascadeTabNet: This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents" (1,454 stars)

Tweets

https://twitter.com/PapersTrending/status/1298923699429670916

https://twitter.com/vaaaaanquish/status/1298610918255808518

https://twitter.com/magicaltrout/status/1417810997990014986

https://twitter.com/Vijaikumar/status/1588763204628664320

https://twitter.com/data__wizard/status/1301526431718023171

https://twitter.com/PapersTrending/status/1299648509197590528