Atlas: Few-shot Learning with Retrieval Augmented Language Models (2208.03299v3)

Published 5 Aug 2022 in cs.CL

Abstract: LLMs have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings. In this work we present Atlas, a carefully designed and pre-trained retrieval augmented LLM able to learn knowledge intensive tasks with very few training examples. We perform evaluations on a wide range of tasks, including MMLU, KILT and NaturalQuestions, and study the impact of the content of the document index, showing that it can easily be updated. Notably, Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a 540B parameters model by 3% despite having 50x fewer parameters.

Citations (607)

View on Semantic Scholar

Summary

The paper introduces Atlas, a model that combines a retrieval mechanism with a seq-to-seq architecture to enhance few-shot learning on knowledge-intensive tasks.
The paper details a joint pre-training approach on retrieval objectives and masked language modeling, reducing dependency on massive model parameters.
The paper reports state-of-the-art results on benchmarks like MMLU and NaturalQuestions through efficient fine-tuning and re-ranking strategies.

Few-shot Learning with Retrieval Augmented LLMs: An Overview of Atlas

The paper "Few-shot Learning with Retrieval Augmented LLMs" presents Atlas, a retrieval-augmented LLM designed to address knowledge-intensive tasks using minimal training samples. This approach challenges the typical reliance on large parameter count models by integrating retrieval mechanisms to access external knowledge, thereby enhancing few-shot learning abilities.

Core Contributions

Design of Atlas: Atlas is structured using a sequence-to-sequence architecture and employs a retrieval component based on the Contriever model for dense retrieval. The retriever identifies relevant documents, and the LLM processes these along with the input to generate desired outputs using the Fusion-in-Decoder architecture.
Training Methodology: The model is jointly pre-trained on retrieval-based objectives and masked LLMing tasks, leveraging both Wikipedia and Common Crawl data to form its document index. This pre-training enables the model to effectively utilize retrieved documents, enhancing its few-shot learning performance. Perplexity Distillation was favored for training the retriever and LLM in conjunction.
Efficiency in Fine-tuning: The paper explores strategies such as query-side fine-tuning and re-ranking to manage computational costs associated with updating the document index during training. These strategies ensure a balance between performance and efficiency, particularly in few-shot setups.
Performance Evaluation: Atlas demonstrates strong results across multiple benchmarks, including KILT, MMLU, and open-domain QA datasets like NaturalQuestions and TriviaQA. It achieves substantial few-shot and full-dataset performance, setting new state-of-the-art results in several tasks.

Numerical Results and Analysis

Few-shot Performance: Atlas surpasses GPT-3 by 4% on MMLU with only 11B parameters and achieves superior results on NaturalQuestions and TriviaQA under a 64-shot setup, outperforming models that are significantly larger in scale.
Task-Specific Results: In the case of NaturalQuestions, Atlas achieves over 42% accuracy with just 64 training examples. For KILT, Atlas secures remarkable results, outperforming many existing models in both few-shot and fully supervised setups.
Implications of Retrieval: The integration of retrieval components is shown to effectively decouple memorization from generalization, providing a scalable alternative to massive parameter models.

Implications and Future Directions

The implications of Atlas' design are significant for the development of more adaptable and efficient AI systems. By reducing dependence on enormous parameter counts, retrieval-augmented models like Atlas offer an improved paradigm for tackling knowledge-intensive tasks, providing opportunities for models to update easily and remain relevant with less retraining.

Future directions could explore the integration of real-time web-based retrieval for dynamic knowledge updates, enhancing the model's ability to remain contextually accurate with evolving information.

In summary, the paper provides compelling evidence that retrieval augmentation not only augments the learning capability of LLMs but does so efficiently, possibly altering the trajectory of how models are built to handle knowledge-intensive scenarios. Atlas exemplifies a significant step forward in leveraging external knowledge sources effectively within AI systems.

PDF Markdown

Related Papers

YouTube

Show All Videos