Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving BERT with Syntax-aware Local Attention (2012.15150v2)

Published 30 Dec 2020 in cs.CL

Abstract: Pre-trained Transformer-based neural LLMs, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions. Most of them restrict the attention scope within a linear span, or confine to certain tasks such as machine translation and question answering. In this paper, we propose a syntax-aware local attention, where the attention scopes are restrained based on the distances in the syntactic structure. The proposed syntax-aware local attention can be integrated with pretrained LLMs, such as BERT, to render the model to focus on syntactically relevant words. We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks. Experimental results show consistent gains over BERT on all benchmark datasets. The extensive studies verify that our model achieves better performance owing to more focused attention over syntactically relevant words.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhongli Li (11 papers)
  2. Qingyu Zhou (28 papers)
  3. Chao Li (429 papers)
  4. Ke Xu (309 papers)
  5. Yunbo Cao (43 papers)
Citations (42)