AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing (2108.05542v2)

Published 12 Aug 2021 in cs.CL

Abstract: Transformer-based pretrained LLMs (T-PTLMs) have achieved great success in almost every NLP task. The evolution of these models started with GPT and BERT. These models are built on the top of transformers, self-supervised learning and transfer learning. Transformed-based PTLMs learn universal language representations from large volumes of text data using self-supervised learning and transfer this knowledge to downstream tasks. These models provide good background knowledge to downstream tasks which avoids training of downstream models from scratch. In this comprehensive survey paper, we initially give a brief overview of self-supervised learning. Next, we explain various core concepts like pretraining, pretraining methods, pretraining tasks, embeddings and downstream adaptation methods. Next, we present a new taxonomy of T-PTLMs and then give brief overview of various benchmarks including both intrinsic and extrinsic. We present a summary of various useful libraries to work with T-PTLMs. Finally, we highlight some of the future research directions which will further improve these models. We strongly believe that this comprehensive survey paper will serve as a good reference to learn the core concepts as well as to stay updated with the recent happenings in T-PTLMs.

Authors (3)

Katikapalli Subramanyam Kalyan (6 papers)
Ajit Rajasekharan (3 papers)
Sivanesan Sangeetha (3 papers)

Citations (243)

View on Semantic Scholar

Summary

AMMUS: Comprehensive Survey of Transformer-based Pretrained LLMs in NLP

The paper "AMMUS: A Survey of Transformer-based Pretrained Models in Natural Language Processing" provides an in-depth survey of Transformer-based Pretrained LLMs (T-PTLMs) and their evolution within the field of NLP. The authors discuss foundational concepts, methodologies, and the trajectory of research in T-PTLMs, making it a valuable reference for researchers aiming to gain a deeper understanding or keep abreast with advancements in T-PTLMs.

The authors organize the paper into several key sections:

Self-Supervised Learning: This section introduces the reader to self-supervised learning (SSL), the backbone of T-PTLM development. SSL allows models to learn from large volumes of unlabeled text by solving predictive tasks and eliminates the need for expensive labeled data. The paper explores the types and benefits of SSL in the context of T-PTLMs.
Core Concepts of T-PTLMs: An exploration of pretraining essentials, including different methods like pretraining from scratch, continual pretraining, and task adaptive pretraining. The paper elucidates diverse pretraining objectives beyond the original Masked LLMs (MLM), such as sequence-to-sequence learning and discriminative tasks.
Taxonomy of T-PTLMs: The authors propose a taxonomy that categorizes T-PTLMs based on pretraining corpus, architecture, SSL types, and model extensions. This categorization aids in understanding the landscape of T-PTLM development across various domains and languages.
Downstream Adaptation Methods: Different strategies, like fine-tuning and prompt-based tuning, are discussed. Fine-tuning can expose models to new tasks, while prompt-based tuning is emerging as a promising area, allowing adaptation via textual prompts which align closely with LLMling tasks.
Evaluation Methods: Intrinsic evaluations, like probing LLM knowledge, are dissected. Extrinsic evaluations through large-scale benchmarks such as GLUE, SuperGLUE, and domain-specific benchmarks assess the functional NLP task performance of models.
Useful Libraries and Tools: The vast ecosystem of libraries for developing and deploying T-PTLMs is briefly covered, pointing to resources that support model training, visualization, and efficient inference.
Future Directions: The paper discusses potential research avenues that may guide further advancements in T-PTLMs. These include exploring more efficient pretraining methods and sample-efficient tasks, improving existing models' robustness to adversarial attacks and noise, and extending model evaluation through better benchmarks.

The paper avoids sensational terminology and provides a measured, comprehensive examination of T-PTLMs. Its value resides in systematically exposing the intricacies of T-PTLMs and situating existing models within a broader methodological framework, while also hinting at the potential advancements the field may witness as researchers continue to innovate in pretraining paradigms, efficient architectures, and adaptive strategies for NLP tasks.

The clear exposition of concepts coupled with a structured view into ongoing research and practical tools makes it a beneficial resource for researchers seeking to leverage T-PTLMs or contribute novel work in the domain. As T-PTLMs continue to influence both theoretical and application-oriented aspects of NLP, this survey stands as a guidepost for reflecting on past achievements and directing efforts towards fertile research areas in AI.

PDF Markdown

Related Papers

Find Related Papers