Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MATrIX -- Modality-Aware Transformer for Information eXtraction (2205.08094v1)

Published 17 May 2022 in cs.CV

Abstract: We present MATrIX - a Modality-Aware Transformer for Information eXtraction in the Visual Document Understanding (VDU) domain. VDU covers information extraction from visually rich documents such as forms, invoices, receipts, tables, graphs, presentations, or advertisements. In these, text semantics and visual information supplement each other to provide a global understanding of the document. MATrIX is pre-trained in an unsupervised way with specifically designed tasks that require the use of multi-modal information (spatial, visual, or textual). We consider the spatial and text modalities all at once in a single token set. To make the attention more flexible, we use a learned modality-aware relative bias in the attention mechanism to modulate the attention between the tokens of different modalities. We evaluate MATrIX on 3 different datasets each with strong baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Thomas Delteil (3 papers)
  2. Edouard Belval (2 papers)
  3. Lei Chen (484 papers)
  4. Luis Goncalves (5 papers)
  5. Vijay Mahadevan (16 papers)
Citations (3)