Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer (2311.06720v1)

Published 12 Nov 2023 in cs.LG and cs.CL

Abstract: LLMs such as T0, FLAN, and OPT-IML, excel in multi-tasking under a unified instruction-following paradigm, where they also exhibit remarkable generalization abilities to unseen tasks. Despite their impressive performance, these LLMs, with sizes ranging from several billion to hundreds of billions of parameters, demand substantial computational resources, making their training and inference expensive and inefficient. Furthermore, adapting these models to downstream applications, particularly complex tasks, is often unfeasible due to the extensive hardware requirements for finetuning, even when utilizing parameter-efficient approaches such as prompt tuning. Additionally, the most powerful multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B, are not publicly accessible, severely limiting their customization potential. To address these challenges, we introduce a pretrained small scorer, Cappy, designed to enhance the performance and efficiency of multi-task LLMs. With merely 360 million parameters, Cappy functions either independently on classification tasks or serve as an auxiliary component for LLMs, boosting their performance. Moreover, Cappy enables efficiently integrating downstream supervision without requiring LLM finetuning nor the access to their parameters. Our experiments demonstrate that, when working independently on 11 language understanding tasks from PromptSource, Cappy outperforms LLMs that are several orders of magnitude larger. Besides, on 45 complex tasks from BIG-Bench, Cappy boosts the performance of the advanced multi-task LLM, FLAN-T5, by a large margin. Furthermore, Cappy is flexible to cooperate with other LLM adaptations, including finetuning and in-context learning, offering additional performance enhancement.

PDF Abstract

Evaluation of "Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer"

The paper introduces "Cappy", a novel pretrained small scorer model designed to enhance both the performance and efficiency of large multi-task LLMs. This work is a response to challenges inherent in deploying and adapting existing LLMs, which are typically characterized by their vast parameter sizes (ranging from billions to hundreds of billions) and the resultant computational and resource demands.

Overview of Large Multi-Task LLMs

LLMs such as T0, FLAN, and OPT-IML represent a new paradigm in NLP, leveraging multi-task learning within an instruction-following framework. These models, trained on vast datasets covering various tasks, demonstrate impressive generalization capabilities, even for unseen tasks. However, the predominant issue lies in the models' massive size, which makes their training and inference costly and time-consuming. Furthermore, adapting these cumbersome models to specific downstream applications is fraught with obstacles, particularly when processing intricate tasks that exceed the capacity of prompt tuning and other parameter-efficient techniques.

The Cappy Approach

Cappy tackles these challenges by introducing a lightweight solution with a mere 360 million parameters. It serves a dual role:

Independent Usage: As an independent scorer in classification tasks, Cappy surpasses LLMs that are several orders of magnitude larger in size.
Auxiliary Role: When utilized alongside other LLMs, Cappy acts as a performant add-on that improves LLM predictions without necessitating model tuning or direct parameter access. This distinct advantage addresses the constraints posed by the closed nature of several advanced multi-task LLMs, such as OPT-IML-175B and FLAN-PaLM-540B.

Empirical Validation

The paper provides empirical evidence supporting Cappy’s potency:

In experiments where only Cappy was applied to 11 distinct language understanding tasks sourced from PromptSource, it consistently outperformed large LLMs, thereby evidencing excellent efficiency.
On a broader spectrum, involving 45 diverse and complex tasks from BIG-Bench, Cappy significantly enhanced the performance of advanced LLMs such as FLAN-T5.

Theoretical and Practical Implications

The impacts of integrating a lightweight model like Cappy touch both theoretical and practical domains:

Theoretically, it showcases an opportunity to refine multi-task training strategies, highlighting the potential of small models acting as robust evaluative components.
Practically, Cappy addresses the efficiency demands of adapting colossal LLMs to specific tasks without exhaustive tuning processes, thus making the deployment of sophisticated AI models more accessible to a wider range of applications and users with limited computational resources.

Future Directions

Looking forward, the research opens up several intriguing avenues:

The refinement of the weakly-supervised data augmentation approach, potentially integrating human feedback to further enhance Cappy’s scoring accuracy.
Collaborative experiments across different domains and problem settings could extend Cappy's architecture to improve human-like instruction adherence and performance in highly specialized tasks.
Encourage the exploration of dynamic collaboration models, where Cappy or similar scorers filter and synergize outputs from multiple LLMs, thus leveraging diverse model strengths collectively.

In summary, the paper presents a compelling case for Cappy, a model that not only challenges the necessity for excessively large LLMs but also sets a precedent for more adaptable and efficient models in the NLP landscape. As AI research progresses, tools like Cappy will likely become integral to bridging the gap between resource-heavy models and their practical applicability across diverse, real-world settings.