Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Injecting Hierarchy with U-Net Transformers (1910.10488v2)

Published 16 Oct 2019 in cs.LG, cs.CL, and stat.ML

Abstract: The Transformer architecture has become increasingly popular over the past two years, owing to its impressive performance on a number of NLP tasks. However, all Transformer computations occur at the level of word representations and therefore, it may be argued that Transformer models do not explicitly attempt to learn hierarchical structure which is widely assumed to be integral to language. In the present work, we introduce hierarchical processing into the Transformer model, taking inspiration from the U-Net architecture, popular in computer vision for its hierarchical view of natural images. We empirically demonstrate that the proposed architecture outperforms both the vanilla Transformer and some strong baselines in the domain of chit-chat dialogue.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. David Donahue (5 papers)
  2. Vladislav Lialin (14 papers)
  3. Anna Rumshisky (42 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.