Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Structural analysis of an all-purpose question answering model (2104.06045v1)

Published 13 Apr 2021 in cs.CL and cs.LG

Abstract: Attention is a key component of the now ubiquitous pre-trained LLMs. By learning to focus on relevant pieces of information, these Transformer-based architectures have proven capable of tackling several tasks at once and sometimes even surpass their single-task counterparts. To better understand this phenomenon, we conduct a structural analysis of a new all-purpose question answering model that we introduce. Surprisingly, this model retains single-task performance even in the absence of a strong transfer effect between tasks. Through attention head importance scoring, we observe that attention heads specialize in a particular task and that some heads are more conducive to learning than others in both the multi-task and single-task settings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Vincent Micheli (8 papers)
  2. Quentin Heinrich (3 papers)
  3. François Fleuret (78 papers)
  4. Wacim Belblidia (3 papers)
Citations (3)