Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks (2210.00185v2)

Published 1 Oct 2022 in cs.CL

Abstract: Although LLMs have achieved impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric LLMs, which augment a smaller LLM with an external retriever, have demonstrated promising LLMing capabilities. However, it remains unclear whether such semi-parametric LLMs can perform competitively well as their fully-parametric counterparts on zero-shot generalization to downstream tasks. In this work, we introduce $\text{Zemi}$, a zero-shot semi-parametric LLM. To our best knowledge, this is the first semi-parametric LLM that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train $\text{Zemi}$ with a novel semi-parametric multitask prompted training paradigm, which shows significant improvement compared with the parametric multitask training as proposed by T0. Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus. In order to incorporate multiple potentially noisy retrieved augmentations, we further propose a novel $\text{augmentation fusion}$ module leveraging perceiver resampler and gated cross-attention. Notably, our proposed $\text{Zemi}_\text{LARGE}$ outperforms T0-3B by 16% on all seven evaluation tasks while being 3.9x smaller in model size.

Citations (9)

Summary

We haven't generated a summary for this paper yet.