Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models (2407.17406v1)

Published 24 Jul 2024 in cs.CL and cs.AI

Abstract: Syntactic Transformer LLMs aim to achieve better generalization through simultaneously modeling syntax trees and sentences. While prior work has been focusing on adding constituency-based structures to Transformers, we introduce Dependency Transformer Grammars (DTGs), a new class of Transformer LLM with explicit dependency-based inductive bias. DTGs simulate dependency transition systems with constrained attention patterns by modifying attention masks, incorporate the stack information through relative positional encoding, and augment dependency arc representation with a combination of token embeddings and operation embeddings. When trained on a dataset of sentences annotated with dependency trees, DTGs achieve better generalization while maintaining comparable perplexity with Transformer LLM baselines. DTGs also outperform recent constituency-based models, showing that dependency can better guide Transformer LLMs. Our code is released at https://github.com/zhaoyd1/Dep_Transformer_Grammars.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yida Zhao (12 papers)
  2. Chao Lou (8 papers)
  3. Kewei Tu (74 papers)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets