Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models (2109.02837v1)

Published 7 Sep 2021 in cs.CL

Abstract: Commonsense reasoning benchmarks have been largely solved by fine-tuning LLMs. The downside is that fine-tuning may cause models to overfit to task-specific data and thereby forget their knowledge gained during pre-training. Recent works only propose lightweight model updates as models may already possess useful knowledge from past experience, but a challenge remains in understanding what parts and to what extent models should be refined for a given task. In this paper, we investigate what models learn from commonsense reasoning datasets. We measure the impact of three different adaptation methods on the generalization and accuracy of models. Our experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative adaptation methods like prefix-tuning have comparable accuracy, but generalize better to unseen answers and are more robust to adversarial splits.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Kaixin Ma (35 papers)
  2. Filip Ilievski (53 papers)
  3. Jonathan Francis (48 papers)
  4. Satoru Ozaki (1 paper)
  5. Eric Nyberg (39 papers)
  6. Alessandro Oltramari (19 papers)
Citations (16)