LLM Augmented LLMs: Expanding Capabilities through Composition (2401.02412v1)

Published 4 Jan 2024 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domains and tasks. In this work, we study the problem of efficient and practical composition of existing foundation models with more specific models to enable newer capabilities. To this end, we propose CALM -- Composition to Augment LLMs -- which introduces cross-attention between models to compose their representations and enable new capabilities. Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using' existing LLMs along with a few additional parameters and data, (ii) Existing model weights are kept intact, and hence preserves existing capabilities, and (iii) Applies to diverse domains and settings. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13\% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40\% over the base model for code generation and explanation tasks -- on-par with fully fine-tuned counterparts.

View on arXiv

Authors (9)

Rachit Bansal (9 papers)
Bidisha Samanta (14 papers)
Siddharth Dalmia (36 papers)
Nitish Gupta (27 papers)
Shikhar Vashishth (23 papers)
Sriram Ganapathy (72 papers)
Abhishek Bapna (1 paper)
Prateek Jain (131 papers)
Partha Talukdar (51 papers)

Citations (26)

View on Semantic Scholar

Summary

Overview of CALM Framework

The concept of enlarging the applications of LLMs without excessive costs has become a topic of intense research. The current strategy involves pre-training LLMs on vast datasets to acquire a foundation of skills, followed by task-specific fine-tuning. Nevertheless, the intricate architecture and sheer size of these LLMs make it burdensome to inject new capabilities. An emerging approach explored in recent research examines the aggregation of existing foundational models with specialized counterparts, paving the way for broadened functionalities while maintaining cost-effectiveness.

Efficient Model Composition

In the paper, researchers introduced CALM (Composition to Augment LLMs), a method that leverages cross-attention to amalgamate the capabilities of both a foundational LLM and a specialized LLM. A pivotal advantage of this technique is the preservation of the pre-trained weights of the foundational model, thus retaining previously acquired skills. Through CALM, the combined LLM scales efficiently on novel tasks by incorporating an additional model that's trained on specific domains, without the need for a substantial increase in parameters and training data.

Real-World Applications

CALM's utility spans several domains, notably language inclusivity and code generation. Researchers showcased the method's efficacy by using a foundational LLM alongside a smaller model trained on low-resource languages. This resulted in marked improvements in translation and mathematical reasoning for these languages. Similarly, when CALM integrates a foundational LLM with a model adept in code-related tasks, it demonstrates a significant enhancement in code generation and related explanations.

Comparison with Existing Techniques

The approach of fine-tuning with few trainable parameters or merging models has been explored in previous works, but these methods were often restricted by issues such as alignment between the original models. CALM bypasses these limitations, offering a versatile and powerful option for combining diverse pre-trained models, regardless of their original size and pre-training goals. The results underscored by researchers signify that the CALM framework outperformed both the individual foundational and specialized models, rivaling even the fully fine-tuned versions.

To summarize, CALM emerges as a pivotal innovation, expanding the abilities of LLMs in a cost-and-resource-conscious manner. This framework enables swift adaptation to new tasks, fostering a dynamic response to ever-changing technological needs.

Related Papers

Find Related Papers

Tweets

https://twitter.com/rach_it_/status/1743362172271603809

https://twitter.com/prajdabre1/status/1804696922562949288

https://twitter.com/johnjnay/status/1743475610976977090

https://twitter.com/arankomatsuzaki/status/1743087525382537321

https://twitter.com/jainprateek_/status/1757871492048814220

https://twitter.com/ai_database/status/1743244218284519651