Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection (1912.11637v1)

Published 25 Dec 2019 in cs.CL and cs.LG

Abstract: Self-attention based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks. Self-attention is able to model long-term dependencies, but it may suffer from the extraction of irrelevant information in the context. To tackle the problem, we propose a novel model called \textbf{Explicit Sparse Transformer}. Explicit Sparse Transformer is able to improve the concentration of attention on the global context through an explicit selection of the most relevant segments. Extensive experimental results on a series of natural language processing and computer vision tasks, including neural machine translation, image captioning, and LLMing, all demonstrate the advantages of Explicit Sparse Transformer in model performance. We also show that our proposed sparse attention method achieves comparable or better results than the previous sparse attention method, but significantly reduces training and testing time. For example, the inference speed is twice that of sparsemax in Transformer model. Code will be available at \url{https://github.com/lancopku/Explicit-Sparse-Transformer}

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Guangxiang Zhao (17 papers)
  2. Junyang Lin (99 papers)
  3. Zhiyuan Zhang (129 papers)
  4. Xuancheng Ren (59 papers)
  5. Qi Su (58 papers)
  6. Xu Sun (194 papers)
Citations (94)
Github Logo Streamline Icon: https://streamlinehq.com