Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Skim-Attention: Learning to Focus via Document Layout (2109.01078v1)

Published 2 Sep 2021 in cs.CL

Abstract: Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2-dimensional position of the words in a document. Our experiments show that Skim-Attention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained LLM, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Laura Nguyen (2 papers)
  2. Thomas Scialom (35 papers)
  3. Jacopo Staiano (38 papers)
  4. Benjamin Piwowarski (38 papers)
Citations (8)