A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT (2302.09419v3)

Published 18 Feb 2023 in cs.AI, cs.CL, and cs.LG

Abstract: Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A PFM (e.g., BERT, ChatGPT, and GPT-4) is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. BERT learns bidirectional encoder representations from Transformers, which are trained on large datasets as contextual LLMs. Similarly, the generative pretrained transformer (GPT) method employs Transformers as the feature extractor and is trained using an autoregressive paradigm on large datasets. Recently, ChatGPT shows promising success on LLMs, which applies an autoregressive LLM with zero shot or few shot prompting. The remarkable achievements of PFM have brought significant breakthroughs to various fields of AI. Numerous studies have proposed different methods, raising the demand for an updated survey. This study provides a comprehensive review of recent research advancements, challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. The review covers the basic components and existing pretraining methods used in natural language processing, computer vision, and graph learning. Additionally, it explores advanced PFMs used for different data modalities and unified PFMs that consider data quality and quantity. The review also discusses research related to the fundamentals of PFMs, such as model efficiency and compression, security, and privacy. Finally, the study provides key implications, future research directions, challenges, and open problems in the field of PFMs. Overall, this survey aims to shed light on the research of the PFMs on scalability, security, logical reasoning ability, cross-domain learning ability, and the user-friendly interactive ability for artificial general intelligence.

PDF Abstract

A Comprehensive Survey on Pretrained Foundation Models

The paper "A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT" by Zhou et al. offers an exhaustive examination of Pretrained Foundation Models (PFMs) and their evolution and application across various domains. Over time, PFMs have established themselves as pivotal in artificial intelligence, fundamentally impacting NLP, computer vision (CV), graph learning (GL), and more.

Overview and Core Contributions

PFMs are large-scale models initially trained on extensive datasets across various tasks, yielding potent representations applicable across numerous downstream tasks and domains. These models, such as BERT, GPT, and ChatGPT, have showcased substantial performance enhancements by enabling rapid adaptability through fine-tuning or zero-shot learning approaches. The paper dissects the impetus behind PFMs, focusing on their capacity to assimilate complex feature representations and address the challenge of limited labeled data availability intrinsic to convolutional and recurrent module-based predecessors.

The survey is structured into several key components:

Historical Context and Evolution: From BERT's bidirectional training framework to GPT's autoregressive mechanisms, the paper outlines the journey and innovations within PFMs. Models like ChatGPT demonstrate remarkable capabilities in prompt-based learning, emphasizing autoregressive paradigms.
Multimodal and Domain-Specific Advancements: Apart from text-centric applications, the paper explores PFMs in image and graph settings, alongside emerging areas such as speech and video. It further explores unified PFMs, which integrate multiple modalities under a single framework to cater to varied data types and application domains.
Model Efficiency and Compression: Addressing the computational demands posed by PFMs, strategies for model efficiency and parameter reduction are explored. The paper highlights advancements in architecture designs and training methodologies to mitigate these challenges while maintaining robust performance standards.
Security, Privacy, and Ethical Considerations: PFMs, due to their comprehensive and often opaque nature, are scrutinized for potential security vulnerabilities, privacy implications, and biases. The survey elucidates challenges like adversarial attacks and memorization of private data within these expansive models.

Implications for Research and Practice

The multifaceted examination not only highlights the technological strides made within the field of PFMs but also underscores the theoretical gaps and computational trade-offs that persist. For instance, better pretraining task alignment with downstream goals is crucial, as is developing methodologies that adapt seamlessly across language and modalities, including images and graphs.

In practice, the paper implies that future advancements will lean heavily on improving interpretability, robustness, and ethical alignment, echoing the broader AI community's call for accountable and transparent model deployment.

The Path Forward

Looking ahead, PFMs present manifold opportunities and challenges:

Multimodal and Multilingual Synergies: Refining methodologies that unify language, vision, and other sensory inputs is pivotal. This entails creating datasets and pretraining tasks that naturally align these modalities with coherent semantic and functional grounding. Additionally, scaling these approaches across languages remains a non-trivial endeavor.
Energy Efficiency and Scalability: Despite their prowess, PFMs are resource-intensive. Future work in PFM scalability must balance computational feasibility with accessibility, ensuring that robust models are not only reserved for organizations with vast computational resources. Innovations in model compression and neural architecture design are likely to play a crucial role in this balance.
Ethics and Fairness: As PFMs increasingly inform societal-scale applications, embedding fairness and minimizing biases become non-negotiable. Developing frameworks for assessing and mitigating biases throughout the training pipeline, from data selection to algorithmic inference, is necessary for ethical AI deployment.

Conclusion

The paper by Zhou et al. provides a stellar framework and an expansive view of the domain of Pretrained Foundation Models, encapsulating their evolution and areas of application, while forging a path for future research built on a foundation of scalability, ethical responsibility, and technical innovation. This comprehensive coverage serves as a keystone work for researchers and practitioners committed to exploring the vast potential and addressing the inherent challenges of PFMs.

PDF Markdown Bookmark Chat (Pro)

Authors (19)

Ce Zhou (11 papers)
Qian Li (236 papers)
Chen Li (386 papers)
Jun Yu (232 papers)
Yixin Liu (108 papers)
Guangjing Wang (12 papers)
Kai Zhang (542 papers)
Cheng Ji (40 papers)
Qiben Yan (40 papers)
Lifang He (98 papers)
Hao Peng (291 papers)
Jianxin Li (128 papers)
Jia Wu (93 papers)
Ziwei Liu (368 papers)
Pengtao Xie (86 papers)
Caiming Xiong (337 papers)
Jian Pei (104 papers)
Philip S. Yu (592 papers)
Lichao Sun (186 papers)

Citations (417)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/TheTuringPost/status/1771596308480188776

YouTube

Show All Videos