Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How Truncating Weights Improves Reasoning in Language Models (2406.03068v1)

Published 5 Jun 2024 in cs.LG, cs.AI, cs.CL, and stat.ML

Abstract: In addition to the ability to generate fluent text in various languages, LLMs have been successful at tasks that involve basic forms of logical "reasoning" over their context. Recent work found that selectively removing certain components from weight matrices in pre-trained models can improve such reasoning capabilities. We investigate this phenomenon further by carefully studying how certain global associations tend to be stored in specific weight components or Transformer blocks, in particular feed-forward layers. Such associations may hurt predictions in reasoning tasks, and removing the corresponding components may then improve performance. We analyze how this arises during training, both empirically and theoretically, on a two-layer Transformer trained on a basic reasoning task with noise, a toy associative memory model, and on the Pythia family of pre-trained models tested on simple reasoning tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Lei Chen (484 papers)
  2. Joan Bruna (119 papers)
  3. Alberto Bietti (35 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com