Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving BERT Pretraining with Syntactic Supervision (2104.10516v1)

Published 21 Apr 2021 in cs.CL and cs.LG

Abstract: Bidirectional masked Transformers have become the core theme in the current NLP landscape. Despite their impressive benchmarks, a recurring theme in recent research has been to question such models' capacity for syntactic generalization. In this work, we seek to address this question by adding a supervised, token-level supertagging objective to standard unsupervised pretraining, enabling the explicit incorporation of syntactic biases into the network's training dynamics. Our approach is straightforward to implement, induces a marginal computational overhead and is general enough to adapt to a variety of settings. We apply our methodology on Lassy Large, an automatically annotated corpus of written Dutch. Our experiments suggest that our syntax-aware model performs on par with established baselines, despite Lassy Large being one order of magnitude smaller than commonly used corpora.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Giorgos Tziafas (3 papers)
  2. Konstantinos Kogkalidis (15 papers)
  3. Gijs Wijnholds (13 papers)
  4. Michael Moortgat (18 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.