Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer (2201.11438v2)

Published 27 Jan 2022 in cs.CV

Abstract: Understanding documents with rich layouts is an essential step towards information extraction. Business intelligence processes often require the extraction of useful semantic content from documents at a large scale for subsequent decision-making tasks. In this context, instance-level segmentation of different document objects (title, sections, figures etc.) has emerged as an interesting problem for the document analysis and understanding community. To advance the research in this direction, we present a transformer-based model called \emph{DocSegTr} for end-to-end instance segmentation of complex layouts in document images. The method adapts a twin attention module, for semantic reasoning, which helps to become highly computationally efficient compared with the state-of-the-art. To the best of our knowledge, this is the first work on transformer-based document segmentation. Extensive experimentation on competitive benchmarks like PubLayNet, PRIMA, Historical Japanese (HJ) and TableBank demonstrate that our model achieved comparable or better segmentation performance than the existing state-of-the-art approaches with the average precision of 89.4, 40.3, 83.4 and 93.3. This simple and flexible framework could serve as a promising baseline for instance-level recognition tasks in document images.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Sanket Biswas (31 papers)
  2. Ayan Banerjee (159 papers)
  3. Umapada Pal (80 papers)
  4. Josep Lladós (40 papers)
Citations (21)