Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LORE: Logical Location Regression Network for Table Structure Recognition (2303.03730v1)

Published 7 Mar 2023 in cs.CV

Abstract: Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate the corresponding markup sequences from the table images. However, they either count on additional heuristic rules to recover the table structures, or require a huge amount of training data and time-consuming sequential decoders. In this paper, we propose an alternative paradigm. We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time combines logical location regression together with spatial location regression of table cells. Our proposed LORE is conceptually simpler, easier to train and more accurate than previous TSR models of other paradigms. Experiments on standard benchmarks demonstrate that LORE consistently outperforms prior arts. Code is available at https:// github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/LORE-TSR.

Citations (14)

Summary

  • The paper introduces a novel approach that models table structure recognition as a logical location regression problem using cascading regressors for spatial and inter-cell supervision.
  • Experiments demonstrate LORE's superior performance over methods like TGRNet and FLAGNet by accurately predicting cell relationships without complex post-processing.
  • This method streamlines conversion of tabular data from images and PDFs into structured databases, enhancing practical applications in automated document processing.

Overview of LORE: Logical Location Regression Network for Table Structure Recognition

This paper introduces LORE, a novel approach for Table Structure Recognition (TSR) that focuses on logical location regression of table cells within images. TSR is a task of increasing prominence, given the ubiquity of tabular data in documents and the necessity of converting it into machine-readable formats. Traditional methods often involve predicting adjacency relationships between cell boxes or generating markup sequences, methods that have inherent limitations. These include reliance on heuristic rules, extensive data requirements, and complex sequential decoders. LORE circumvents these by modeling TSR as a logical location regression problem, combined innovatively with spatial location regression.

LORE's Methodology and Results

LORE's framework exploits a convolutional neural network (CNN) to jointly regress spatial and logical locations of table cells from input images. It introduces cascading regressors, incorporating inter- and intra-cell supervisions to capture the dependencies and constraints inherent in the logical locations of cells. This approach enables a parallel inference process that eliminates the need for complicated post-processing or sequential strategies.

Experimental results demonstrate LORE's superior performance compared to previous methods across various metrics. It shows significant improvement in logical location prediction accuracy over existing models like TGRNet and Res2TIM. Moreover, LORE's ability to derive both adjacency relations and markup sequences from its output further highlights its efficacy, outperforming benchmark models like FLAGNet, NCGM, and EDD on their respective metrics.

Implications and Future Directions

LORE's approach of treating TSR as a logical location regression problem presents a promising new direction for achieving higher accuracy and efficiency in table structure recognition. Its simplicity and effectiveness suggest it could be highly adaptable to a range of document types and table complexities, broadening its potential applications in automated document processing and information retrieval.

Practical implications include streamlined data conversion from PDF and image-based tables into structured databases, enhancing systems for analytics, recommendation, and automated reporting. Theoretically, LORE's successes in capturing interdependent relationships could inform broader applications in computer vision and AI modeling tasks, where spatial-logical dependencies are present.

Future research could extend LORE by exploring its integration with text recognition systems to enhance semantic extraction from documents. Furthermore, adaptations to handle more complex layouts, including nested tables or dynamically changing formats, could broaden its usability. Scaling LORE for real-time applications and mobile platforms is also a potential avenue, emphasizing the balance between computational efficiency and accuracy.