Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Go-tuning: Improving Zero-shot Learning Abilities of Smaller Language Models (2212.10461v1)

Published 20 Dec 2022 in cs.CL

Abstract: With increasing scale, LLMs demonstrate both quantitative improvement and new qualitative capabilities, especially as zero-shot learners, like GPT-3. However, these results rely heavily on delicate prompt design and large computation. In this work, we explore whether the strong zero-shot ability could be achieved at a smaller model scale without any external supervised data. To achieve this goal, we revisit masked LLMing and present a geometry-guided self-supervised learning method (Go-tuningfor short) by taking a small number of task-aware self-supervised data to update LLMs further. Experiments show that Go-tuning can enable T5-small (80M) competitive zero-shot results compared with LLMs, such as T5-XL (3B). We also apply Go-tuning on multi-task settings and develop a multi-task model, mgo-T5 (250M). It can reach the average performance of OPT (175B) on 9 datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jingjing Xu (80 papers)
  2. Qingxiu Dong (39 papers)
  3. Hongyi Liu (26 papers)
  4. Lei Li (1293 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.