Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM Augmented LLMs: Expanding Capabilities through Composition (2401.02412v1)

Published 4 Jan 2024 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domains and tasks. In this work, we study the problem of efficient and practical composition of existing foundation models with more specific models to enable newer capabilities. To this end, we propose CALM -- Composition to Augment LLMs -- which introduces cross-attention between models to compose their representations and enable new capabilities. Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using' existing LLMs along with a few additional parameters and data, (ii) Existing model weights are kept intact, and hence preserves existing capabilities, and (iii) Applies to diverse domains and settings. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13\% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40\% over the base model for code generation and explanation tasks -- on-par with fully fine-tuned counterparts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Rachit Bansal (9 papers)
  2. Bidisha Samanta (14 papers)
  3. Siddharth Dalmia (36 papers)
  4. Nitish Gupta (27 papers)
  5. Shikhar Vashishth (23 papers)
  6. Sriram Ganapathy (72 papers)
  7. Abhishek Bapna (1 paper)
  8. Prateek Jain (131 papers)
  9. Partha Talukdar (51 papers)
Citations (26)

Summary

Overview of CALM Framework

The concept of enlarging the applications of LLMs without excessive costs has become a topic of intense research. The current strategy involves pre-training LLMs on vast datasets to acquire a foundation of skills, followed by task-specific fine-tuning. Nevertheless, the intricate architecture and sheer size of these LLMs make it burdensome to inject new capabilities. An emerging approach explored in recent research examines the aggregation of existing foundational models with specialized counterparts, paving the way for broadened functionalities while maintaining cost-effectiveness.

Efficient Model Composition

In the paper, researchers introduced CALM (Composition to Augment LLMs), a method that leverages cross-attention to amalgamate the capabilities of both a foundational LLM and a specialized LLM. A pivotal advantage of this technique is the preservation of the pre-trained weights of the foundational model, thus retaining previously acquired skills. Through CALM, the combined LLM scales efficiently on novel tasks by incorporating an additional model that's trained on specific domains, without the need for a substantial increase in parameters and training data.

Real-World Applications

CALM's utility spans several domains, notably language inclusivity and code generation. Researchers showcased the method's efficacy by using a foundational LLM alongside a smaller model trained on low-resource languages. This resulted in marked improvements in translation and mathematical reasoning for these languages. Similarly, when CALM integrates a foundational LLM with a model adept in code-related tasks, it demonstrates a significant enhancement in code generation and related explanations.

Comparison with Existing Techniques

The approach of fine-tuning with few trainable parameters or merging models has been explored in previous works, but these methods were often restricted by issues such as alignment between the original models. CALM bypasses these limitations, offering a versatile and powerful option for combining diverse pre-trained models, regardless of their original size and pre-training goals. The results underscored by researchers signify that the CALM framework outperformed both the individual foundational and specialized models, rivaling even the fully fine-tuned versions.

To summarize, CALM emerges as a pivotal innovation, expanding the abilities of LLMs in a cost-and-resource-conscious manner. This framework enables swift adaptation to new tasks, fostering a dynamic response to ever-changing technological needs.

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com