Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Does Selective Mechanism Improve Self-Attention Networks? (2005.00979v1)

Published 3 May 2020 in cs.CL and cs.LG

Abstract: Self-attention networks (SANs) with selective mechanism has produced substantial improvements in various NLP tasks by concentrating on a subset of input words. However, the underlying reasons for their strong performance have not been well explained. In this paper, we bridge the gap by assessing the strengths of selective SANs (SSANs), which are implemented with a flexible and universal Gumbel-Softmax. Experimental results on several representative NLP tasks, including natural language inference, semantic role labelling, and machine translation, show that SSANs consistently outperform the standard SANs. Through well-designed probing experiments, we empirically validate that the improvement of SSANs can be attributed in part to mitigating two commonly-cited weaknesses of SANs: word order encoding and structure modeling. Specifically, the selective mechanism improves SANs by paying more attention to content words that contribute to the meaning of the sentence. The code and data are released at https://github.com/xwgeng/SSAN.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xinwei Geng (6 papers)
  2. Longyue Wang (87 papers)
  3. Xing Wang (191 papers)
  4. Bing Qin (186 papers)
  5. Ting Liu (329 papers)
  6. Zhaopeng Tu (135 papers)
Citations (33)

Summary

We haven't generated a summary for this paper yet.