CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation (2109.05729v4)

Published 13 Sep 2021 in cs.CL

Abstract: In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked LLMing (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced Transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.

PDF Abstract

Overview of CPT: A Pre-Trained Unbalanced Transformer for Chinese Language Tasks

The paper introduces CPT, a Pre-trained Unbalanced Transformer specifically designed for Chinese language tasks, both in understanding (NLU) and generation (NLG). Unlike conventional pre-trained models that focus separately on either NLU or NLG, CPT is structured to leverage the shared knowledge between these tasks to enhance its performance across a range of scenarios.

Architectural Design

CPT's architecture diverges from traditional models by incorporating a shared encoder and two distinct decoders: one dedicated to understanding and the other to generation. The shared encoder captures universal semantic representations, while the task-specific decoders focus on either NLU, employing masked LLMing (MLM), or NLG, leveraging denoising auto-encoding (DAE). This configuration allows CPT to be both computationally efficient and highly adaptable, offering flexibility in fine-tuning to optimize performance on various downstream tasks.

Experimental Validation

CPT was evaluated across a broad spectrum of Chinese language tasks. The results indicate that CPT not only maintains competitiveness with existing state-of-the-art models but also achieves superior performance in several areas. For instance, CPT achieved an average accuracy of 72.4 on the test sets of the CLUE Benchmark for classification tasks, outperforming prevalent models like RoBERTa and BERT at both base and large sizes.

In sequence labeling and MRC, CPT consistently delivered higher F1 and EM scores, respectively. Particularly on datasets like CMRC and DRCD, CPT surpassed current benchmarks, demonstrating the efficacy of its dual-decoder approach.

Implications and Future Directions

The implications of CPT are significant for both practical NLP tasks and theoretical advancements in model architecture design. Practically, it reduces the overhead of maintaining separate models for understanding and generation tasks, offering a unified framework that is both cost-effective and efficient.

Theoretically, CPT paves the way for more advanced, unified models that can seamlessly handle diverse tasks. Future research might explore the expansion of this architecture to other languages and further enhancement of its adaptability and efficiency through additional pre-training strategies or model simplifications.

In conclusion, CPT represents a notable advancement in the application of pre-trained models for Chinese NLP tasks, merging the strengths of both NLU and NLG in a single, efficient framework. With its open-source availability, it holds potential for wide adoption and further exploration in natural language understanding and generation.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Yunfan Shao (19 papers)
Zhichao Geng (7 papers)
Yitao Liu (10 papers)
Junqi Dai (9 papers)
Hang Yan (86 papers)
Fei Yang (110 papers)
Li Zhe (2 papers)
Hujun Bao (134 papers)
Xipeng Qiu (257 papers)

Citations (140)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - fastnlp/CPT: CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation (488 stars)