Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Do Neural Language Models Show Preferences for Syntactic Formalisms? (2004.14096v1)

Published 29 Apr 2020 in cs.CL and cs.LG

Abstract: Recent work on the interpretability of deep neural LLMs has concluded that many properties of natural language syntax are encoded in their representational spaces. However, such studies often suffer from limited scope by focusing on a single language and a single linguistic formalism. In this study, we aim to investigate the extent to which the semblance of syntactic structure captured by LLMs adheres to a surface-syntactic or deep syntactic style of analysis, and whether the patterns are consistent across different languages. We apply a probe for extracting directed dependency trees to BERT and ELMo models trained on 13 different languages, probing for two different syntactic annotation styles: Universal Dependencies (UD), prioritizing deep syntactic relations, and Surface-Syntactic Universal Dependencies (SUD), focusing on surface structure. We find that both models exhibit a preference for UD over SUD - with interesting variations across languages and layers - and that the strength of this preference is correlated with differences in tree shape.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Artur Kulmizev (11 papers)
  2. Vinit Ravishankar (11 papers)
  3. Mostafa Abdou (18 papers)
  4. Joakim Nivre (30 papers)
Citations (41)