Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Convolutional Self-Attention Network (1810.13320v2)

Published 31 Oct 2018 in cs.CL

Abstract: Self-attention network (SAN) has recently attracted increasing interest due to its fully parallelized computation and flexibility in modeling dependencies. It can be further enhanced with multi-headed attention mechanism by allowing the model to jointly attend to information from different representation subspaces at different positions (Vaswani et al., 2017). In this work, we propose a novel convolutional self-attention network (CSAN), which offers SAN the abilities to 1) capture neighboring dependencies, and 2) model the interaction between multiple attention heads. Experimental results on WMT14 English-to-German translation task demonstrate that the proposed approach outperforms both the strong Transformer baseline and other existing works on enhancing the locality of SAN. Comparing with previous work, our model does not introduce any new parameters.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Baosong Yang (57 papers)
  2. Longyue Wang (87 papers)
  3. Derek F. Wong (69 papers)
  4. Lidia S. Chao (41 papers)
  5. Zhaopeng Tu (135 papers)
Citations (11)