Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading (2310.14802v1)

Published 23 Oct 2023 in cs.HC, cs.CV, and cs.IR

Abstract: The use of visually-rich documents (VRDs) in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce \textsc{DocTrack}, a VRD dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progress, they still have a long way to go before they can read VRDs as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of Document AI models. The data is available at \url{https://github.com/hint-lab/doctrack}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hao Wang (1120 papers)
  2. Qingxuan Wang (8 papers)
  3. Yue Li (219 papers)
  4. Changqing Wang (17 papers)
  5. Chenhui Chu (48 papers)
  6. Rui Wang (996 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.