Pre-Trained Models: Past, Present and Future (2106.07139v3)

Published 14 Jun 2021 in cs.AI and cs.CL

Abstract: Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of AI. Owing to sophisticated pre-training objectives and huge model parameters, large-scale PTMs can effectively capture knowledge from massive labeled and unlabeled data. By storing knowledge into huge parameters and fine-tuning on specific tasks, the rich knowledge implicitly encoded in huge parameters can benefit a variety of downstream tasks, which has been extensively demonstrated via experimental verification and empirical analysis. It is now the consensus of the AI community to adopt PTMs as backbone for downstream tasks rather than learning models from scratch. In this paper, we take a deep look into the history of pre-training, especially its special relation with transfer learning and self-supervised learning, to reveal the crucial position of PTMs in the AI development spectrum. Further, we comprehensively review the latest breakthroughs of PTMs. These breakthroughs are driven by the surge of computational power and the increasing availability of data, towards four important directions: designing effective architectures, utilizing rich contexts, improving computational efficiency, and conducting interpretation and theoretical analysis. Finally, we discuss a series of open problems and research directions of PTMs, and hope our view can inspire and advance the future study of PTMs.

PDF Abstract

Insightful Overview of "Pre-Trained Models: Past, Present and Future"

The paper "Pre-Trained Models: Past, Present and Future" offers a comprehensive survey of large pre-trained models (PTMs) such as BERT and GPT, which are considered pivotal in advancing the field of AI. The authors systematically dissect the evolution, current state, and future directions of PTMs, emphasizing their crucial interplay with transfer learning and self-supervised learning methodologies.

The paper highlights several core aspects:

Historical Context and Development

The authors first explore the historical trajectory of PTMs, illustrating how pre-training methodologies have been integral to AI's progression. Initially, pre-training was executed using supervised learning paradigms that relied heavily on labeled datasets. However, the exponential growth in data availability and computational power has shifted the focus towards self-supervised pre-training. This transition has enabled PTMs to acquire vast amounts of implicit knowledge efficiently encapsulated within their parameters, which can be effectively transferred via fine-tuning techniques.

Breakthroughs and Innovations in PTMs

Focusing on recent advancements, the paper categorizes them into four domains:

Architectural Innovations: Examination of novel architectures beyond BERT and GPT, such as XLNet and RoBERTa. These models aim to enhance PTMs by integrating new design principles, including combining autoregressive and autoencoding modeling techniques.
Utilization of Diverse Contexts: Development of multilingual and multimodal PTMs. This involves leveraging parallel corpora and integrating text with other data types like images, facilitating a more robust understanding across languages and contexts.
Improving Computational Efficiency: Strategies to enhance the efficiency of PTMs span system-level optimizations such as mixed-precision training and distributed training techniques, alongside algorithmic innovations like model pruning and parameter sharing.
Interpretation and Analysis: The paper underscores efforts to demystify PTMs, focusing on linguistic and world knowledge these models extract. Theoretical analyses are discussed to understand PTMs' robust generalization capabilities better.

Open Challenges and Future Directions

Several unresolved issues and potential future research directions are identified:

Architectural Design: The need for more adaptable and efficient model architectures, possibly involving neural architecture search, is emphasized.
Enhanced Pre-Training Tasks: Exploration of pre-training tasks that better align model capabilities with real-world application demands.
Adaptation Mechanisms: Beyond mere fine-tuning, methods like continuous prompt tuning are highlighted as avenues to improve the adaptability of PTMs in various domains.
Cross-Modal and Multilingual Integration: Encouragement towards expanding PTMs' applicability by incorporating richer and more diverse datasets, extending to multilingual and multimodal frameworks.
Understanding and Exploiting Modeledge: A novel approach to manage the implicit knowledge within PTMs, termed as "modeledge," is proposed, advocating for frameworks like the universal continuous knowledge base (UCKB).

Implications and Prospects

The continual scaling of PTMs suggests immense potential for further breakthroughs in AI. However, the paper also cautions against challenges such as computational cost and understanding PTMs' internal mechanisms. It anticipates further developments in fields like cognitive-inspired architectures and novel applications extending into domain-specific fine-tuning.

In conclusion, the paper offers a foundational understanding of the past and present landscape of PTMs while proposing a forward-looking view on future research directions, thus serving as a detailed guide for experienced researchers in AI.

PDF Markdown Bookmark Chat (Pro)

Authors (24)

Xu Han (270 papers)
Zhengyan Zhang (46 papers)
Ning Ding (122 papers)
Yuxian Gu (21 papers)
Xiao Liu (402 papers)
Yuqi Huo (19 papers)
Jiezhong Qiu (29 papers)
Yuan Yao (292 papers)
Ao Zhang (45 papers)
Liang Zhang (357 papers)
Wentao Han (6 papers)
Minlie Huang (225 papers)
Qin Jin (94 papers)
Yanyan Lan (87 papers)
Yang Liu (2253 papers)
Zhiyuan Liu (433 papers)
Zhiwu Lu (51 papers)
Xipeng Qiu (257 papers)
Ruihua Song (48 papers)
Jie Tang (302 papers)

Citations (717)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos