Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks (2210.00185v2)

Published 1 Oct 2022 in cs.CL

Abstract: Although LLMs have achieved impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric LLMs, which augment a smaller LLM with an external retriever, have demonstrated promising LLMing capabilities. However, it remains unclear whether such semi-parametric LLMs can perform competitively well as their fully-parametric counterparts on zero-shot generalization to downstream tasks. In this work, we introduce $\text{Zemi}$, a zero-shot semi-parametric LLM. To our best knowledge, this is the first semi-parametric LLM that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train $\text{Zemi}$ with a novel semi-parametric multitask prompted training paradigm, which shows significant improvement compared with the parametric multitask training as proposed by T0. Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus. In order to incorporate multiple potentially noisy retrieved augmentations, we further propose a novel $\text{augmentation fusion}$ module leveraging perceiver resampler and gated cross-attention. Notably, our proposed $\text{Zemi}_\text{LARGE}$ outperforms T0-3B by 16% on all seven evaluation tasks while being 3.9x smaller in model size.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhenhailong Wang (17 papers)
  2. Xiaoman Pan (25 papers)
  3. Dian Yu (78 papers)
  4. Dong Yu (329 papers)
  5. Jianshu Chen (66 papers)
  6. Heng Ji (266 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.