Papers
Topics
Authors
Recent
Search
2000 character limit reached

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Published 13 Sep 2021 in cs.CL | (2109.05729v4)

Abstract: In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked language modeling (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced Transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.

Citations (140)

Summary

  • The paper introduces CPT, a novel transformer that unifies Chinese language understanding and generation by using a shared encoder paired with two specialized decoders.
  • The model employs masked language modeling for NLU and denoising auto-encoding for NLG, achieving an average accuracy of 72.4 on the CLUE benchmark and outperforming models like BERT and RoBERTa.
  • Its efficient architecture demonstrates enhanced performance in sequence labeling and machine reading comprehension tasks, paving the way for unified and cost-effective Chinese NLP solutions.

Overview of CPT: A Pre-Trained Unbalanced Transformer for Chinese Language Tasks

The paper introduces CPT, a Pre-trained Unbalanced Transformer specifically designed for Chinese language tasks, both in understanding (NLU) and generation (NLG). Unlike conventional pre-trained models that focus separately on either NLU or NLG, CPT is structured to leverage the shared knowledge between these tasks to enhance its performance across a range of scenarios.

Architectural Design

CPT's architecture diverges from traditional models by incorporating a shared encoder and two distinct decoders: one dedicated to understanding and the other to generation. The shared encoder captures universal semantic representations, while the task-specific decoders focus on either NLU, employing masked language modeling (MLM), or NLG, leveraging denoising auto-encoding (DAE). This configuration allows CPT to be both computationally efficient and highly adaptable, offering flexibility in fine-tuning to optimize performance on various downstream tasks.

Experimental Validation

CPT was evaluated across a broad spectrum of Chinese language tasks. The results indicate that CPT not only maintains competitiveness with existing state-of-the-art models but also achieves superior performance in several areas. For instance, CPT achieved an average accuracy of 72.4 on the test sets of the CLUE Benchmark for classification tasks, outperforming prevalent models like RoBERTa and BERT at both base and large sizes.

In sequence labeling and MRC, CPT consistently delivered higher F1 and EM scores, respectively. Particularly on datasets like CMRC and DRCD, CPT surpassed current benchmarks, demonstrating the efficacy of its dual-decoder approach.

Implications and Future Directions

The implications of CPT are significant for both practical NLP tasks and theoretical advancements in model architecture design. Practically, it reduces the overhead of maintaining separate models for understanding and generation tasks, offering a unified framework that is both cost-effective and efficient.

Theoretically, CPT paves the way for more advanced, unified models that can seamlessly handle diverse tasks. Future research might explore the expansion of this architecture to other languages and further enhancement of its adaptability and efficiency through additional pre-training strategies or model simplifications.

In conclusion, CPT represents a notable advancement in the application of pre-trained models for Chinese NLP tasks, merging the strengths of both NLU and NLG in a single, efficient framework. With its open-source availability, it holds potential for wide adoption and further exploration in natural language understanding and generation.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.