Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis (2208.10970v2)

Published 22 Aug 2022 in cs.CV and cs.LG

Abstract: Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on computer vision models to understand documents while ignoring other information, such as context information or relation of document components, which are vital to capture. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We first construct graphs to explicitly describe four main aspects, including syntactic, semantic, density, and appearance/visual information. Then, we apply graph convolutional networks for representing each aspect of information and use pooling to integrate them. Finally, we aggregate each aspect and feed them into 2-layer MLPs for document layout component classification. Our Doc-GCN achieves new state-of-the-art results in three widely used DLA datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Siwen Luo (14 papers)
  2. Yihao Ding (16 papers)
  3. Siqu Long (18 papers)
  4. Josiah Poon (41 papers)
  5. Soyeon Caren Han (48 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.