A Survey of Large Language Models (2303.18223v15)

Published 31 Mar 2023 in cs.CL and cs.AI

Abstract: Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach, LLMing has been widely studied for language understanding and generation in the past two decades, evolving from statistical LLMs to neural LLMs. Recently, pre-trained LLMs (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks. Since researchers have found that model scaling can lead to performance improvement, they further study the scaling effect by increasing the model size to an even larger size. Interestingly, when the parameter scale exceeds a certain level, these enlarged LLMs not only achieve a significant performance improvement but also show some special abilities that are not present in small-scale LLMs. To discriminate the difference in parameter scale, the research community has coined the term LLMs (LLM) for the PLMs of significant size. Recently, the research on LLMs has been largely advanced by both academia and industry, and a remarkable progress is the launch of ChatGPT, which has attracted widespread attention from society. The technical evolution of LLMs has been making an important impact on the entire AI community, which would revolutionize the way how we develop and use AI algorithms. In this survey, we review the recent advances of LLMs by introducing the background, key findings, and mainstream techniques. In particular, we focus on four major aspects of LLMs, namely pre-training, adaptation tuning, utilization, and capacity evaluation. Besides, we also summarize the available resources for developing LLMs and discuss the remaining issues for future directions.

PDF Abstract

An Expert Overview of "A Survey of LLMs"

Introduction

Wayne Xin Zhao et al. present a comprehensive review of LLMs, providing a historical context and systematic analysis of their recent advances—focusing particularly on the phases of pre-training, adaptation tuning, utilization, and capacity evaluation. The survey aims to furnish an up-to-date reference for researchers and engineers, thoroughly examining theoretical foundations, empirical strategies, resources, and future directions related to LLMs.

Evolution of LLMs

The paper outlines the evolution of LLMs through four major stages:

Statistical LLMs (SLMs): Developed in the 1990s, SLMs relied on statistical methods under the Markov assumption and were widely used in tasks such as information retrieval and speech recognition.
Neural LLMs (NLMs): Introduced the concept of distributed representations for text, leading to superior performance in various NLP tasks through methods like word2vec and recurrent neural networks (RNNs).
Pre-trained LLMs (PLMs): Leveraged large-scale unlabeled corpora and Transformer architectures. Examples include ELMo, BERT, and successors that follow the "pre-training and fine-tuning" paradigm.
LLMs: Characterized by their significant parameter count (tens or hundreds of billions), LLMs exhibit unique behaviors and emergent abilities that smaller models do not, such as in-context learning (ICL) and instruction following. Notable examples include GPT-3 and PaLM.

Key Technical Components

Pre-training

Pre-training establishes the foundation for LLMs:

Data Sources and Preparation: LLMs leverage diverse, large-scale datasets. Quality filtering and deduplication are essential preprocessing steps to ensure high-quality training data.
Architectures and Optimization: The Transformer architecture is the backbone of LLM architectures. Training stability, efficient data usage, and computational strategies are crucial for successful pre-training.
Scaling Laws: Empirical studies highlight scaling laws that predict improved model performances with increasing model and data sizes, underpinning the development strategy of LLMs.

Adaptation Tuning

Adaptation tuning comprises two key approaches:

Instruction Tuning: Fine-tuning LLMs on instruction-formatted data to improve generalization and task-solving capabilities. Methods include manual annotation and synthetic data generation.
Alignment Tuning: Focuses on aligning LLM outputs with human preferences and values through techniques like Reinforcement Learning from Human Feedback (RLHF), enhancing models’ helpfulness, honesty, and harmlessness.

Utilization

Proper utilization of LLMs hinges on effective prompting strategies:

In-Context Learning (ICL): Uses task descriptions and examples to prompt LLMs without additional training. Effective demonstration selection and formatting are crucial.
Chain-of-Thought (CoT) Prompting: Involves intermediate reasoning steps within prompts, significantly improving LLMs’ performance in complex reasoning tasks. Variants include sampling and verification-based methods, which address potential pitfalls like task complexity and lack of supervision.

Implications and Future Directions

Practical Applications

LLMs have shown tremendous potential in various fields, including NLP, information retrieval, and computer vision:

Integration with tools and plugins to expand functionalities (e.g., ChatGPT plugins).
Application to domain-specific challenges such as scientific discovery, healthcare, and legal services.

Theoretical Insights

Theoretical understanding of emergent abilities and scaling laws guides the development and utilization of LLMs:

Further investigation into the underlying principles of emergent abilities could foster advancements in AI capabilities.
Exploration of novel architectures and continuous learning paradigms can lead to more efficient and adaptable models.

Challenges and Limitations

Despite their capabilities, LLMs pose certain challenges:

Data Scarcity: With ever-growing model sizes, the availability of high-quality training data becomes a potential bottleneck.
Alignment Issues: Ensuring that LLMs align with human values without losing performance remains an ongoing concern.
Computational Costs: Training and deploying LLMs require substantial compute resources, highlighting the need for efficient training techniques and hardware advancements.

Conclusion

The survey by Zhao et al. provides an extensive overview of the state-of-the-art in LLMs, covering their historical context, technical underpinnings, practical applications, and future trajectories. By addressing both achievements and challenges, it serves as an invaluable resource for researchers and engineers looking to navigate the rapidly evolving landscape of LLMs.