ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation (2107.02137v1)

Published 5 Jul 2021 in cs.CL

Abstract: Pre-trained models have achieved state-of-the-art results in various NLP tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained LLMs can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-of-the-art models on 54 Chinese NLP tasks, and its English version achieves the first place on the SuperGLUE benchmark (July 3, 2021), surpassing the human performance by +0.8% (90.6% vs. 89.8%).

PDF Abstract

ERNIE 3.0: Advanced Knowledge Enhanced Pre-training Framework

The paper "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation" introduces a significant advancement in pre-trained LLMs, particularly focusing on enhancing natural language understanding and generation through knowledge integration. ERNIE 3.0 represents an evolution from its predecessors, incorporating a unified framework that combines both auto-regressive and auto-encoding models to seamlessly handle various NLP tasks. The authors address existing limitations in large-scale LLMs and propose a solution in ERNIE 3.0 by integrating a larger corpus of linguistic and world knowledge.

Key Contributions and Experimental Results

ERNIE 3.0 is notable for its architecture, which integrates a Universal Representation Module and Task-specific Representation Modules tailored for distinct task paradigms such as natural language understanding and generation. The model is pre-trained on a substantial 4TB corpus, one of the largest in the space, comprising both plain text and an extensive knowledge graph. This corpus allows ERNIE 3.0 to achieve a robust knowledge-internalization capability, critical for zero-shot and few-shot learning scenarios.

The experimental results highlight ERNIE 3.0's superior performance across diverse NLP tasks, surpassing the state-of-the-art on benchmarks such as SuperGLUE and 54 Chinese NLP tasks. Specifically, the model achieves a score of 90.6 on the SuperGLUE benchmark, surpassing human baselines and existing pre-trained LLMs like DeBERTa and T5. Furthermore, ERNIE 3.0 is validated through its success in various tasks including sentiment analysis, opinion extraction, closed-book question answering, and named entity recognition, demonstrating its versatility and effectiveness.

Theoretical and Practical Implications

From a theoretical standpoint, ERNIE 3.0 pushes the boundaries of model scale and knowledge integration in pre-trained models. The hybrid approach, balancing auto-regressive and auto-encoding strategies with continual multi-task learning, marks a significant paradigm shift, blending different learning objectives for improved generalization. The incorporation of large-scale knowledge graphs into the training process represents a thoughtful synthesis of structured and unstructured data, creating a nuanced framework for understanding and generation.

Practically, ERNIE 3.0 offers advancements in real-world AI applications by providing robust solutions to tasks requiring language understanding, contextual reasoning, and language generation with minimal fine-tuning. By addressing the high cost and inefficiency of training humongous models from scratch, ERNIE 3.0 supports a more resource-effective training regimen through task-specific networks and continual learning mechanisms. This makes it a compelling option for industry deployment, allowing businesses and organizations to harness AI capabilities at scale without prohibitive costs.

Future Directions

Looking ahead, the ERNIE 3.0 framework paves the way for more nuanced integration of multi-modal data (e.g., visual and auditory data) to further bolster LLMs’ contextual understanding. Moreover, its success may spur further research into hybrid architectures that combine distinct model paradigms, as well as explore optimal methods for the incremental growth of existing models’ knowledge bases.

In summary, ERNIE 3.0 not only sets a high benchmark for pre-trained models but also instigates a broader discourse on the design and deployment of large-scale language frameworks. As the field continues to evolve, the insights and methodologies proposed in this paper will undoubtedly guide future innovations in natural language processing.

PDF Markdown Bookmark Chat (Pro)

Authors (22)

Yu Sun (226 papers)
Shuohuan Wang (30 papers)
Shikun Feng (37 papers)
Siyu Ding (6 papers)
Chao Pang (23 papers)
Junyuan Shang (15 papers)
Jiaxiang Liu (39 papers)
Xuyi Chen (9 papers)
Yanbin Zhao (14 papers)
Yuxiang Lu (26 papers)
Weixin Liu (12 papers)
Zhihua Wu (24 papers)
Weibao Gong (5 papers)
Jianzhong Liang (2 papers)
Zhizhou Shang (1 paper)
Peng Sun (210 papers)
Wei Liu (1135 papers)
Xuan Ouyang (9 papers)
Dianhai Yu (37 papers)
Hao Tian (146 papers)

Citations (374)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/JaicSam/status/1776657669438542294

YouTube

Show All Videos