Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Layer-wise Representation Fusion for Compositional Generalization (2307.10799v2)

Published 20 Jul 2023 in cs.CL

Abstract: Existing neural models are demonstrated to struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. A key reason for failure on CG is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. However, previous work concentrates on separating the learning of syntax and semantics instead of exploring the reasons behind the representation entanglement (RE) problem to solve it. We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers. We find that the ``shallow'' residual connections within each layer fail to fuse previous layers' information effectively, leading to information forgetting between layers and further the RE problems. Inspired by this, we propose LRF, a novel \textbf{L}ayer-wise \textbf{R}epresentation \textbf{F}usion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process effectively through introducing a \emph{fuse-attention module} at each encoder and decoder layer. LRF achieves promising results on two realistic benchmarks, empirically demonstrating the effectiveness of our proposal.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yafang Zheng (2 papers)
  2. Lei Lin (42 papers)
  3. Shuangtao Li (5 papers)
  4. Yuxuan Yuan (33 papers)
  5. Zhaohong Lai (1 paper)
  6. Shan Liu (94 papers)
  7. Biao Fu (8 papers)
  8. Yidong Chen (27 papers)
  9. Xiaodong Shi (34 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.