MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding (2110.08518v2)

Published 16 Oct 2021 in cs.CL

Abstract: Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images. While, there are still a large number of digital documents where the layout information is not fixed and needs to be interactively and dynamically rendered for visualization, making existing layout-based pre-training approaches not easy to apply. In this paper, we propose MarkupLM for document understanding tasks with markup languages as the backbone, such as HTML/XML-based documents, where text and markup information is jointly pre-trained. Experiment results show that the pre-trained MarkupLM significantly outperforms the existing strong baseline models on several document understanding tasks. The pre-trained model and code will be publicly available at https://aka.ms/markuplm.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (4)

Junlong Li (22 papers)
Yiheng Xu (20 papers)
Lei Cui (43 papers)
Furu Wei (291 papers)

Citations (54)

View on Semantic Scholar

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding (2110.08518v2)

Related Papers