Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation (2109.05729v4)

Published 13 Sep 2021 in cs.CL
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Abstract: In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understanding decoder, and a generation decoder. Two specific decoders with a shared encoder are pre-trained with masked LLMing (MLM) and denoising auto-encoding (DAE) tasks, respectively. With the partially shared architecture and multi-task pre-training, CPT can (1) learn specific knowledge of both NLU or NLG tasks with two decoders and (2) be fine-tuned flexibly that fully exploits the potential of the model. Moreover, the unbalanced Transformer saves the computational and storage cost, which makes CPT competitive and greatly accelerates the inference of text generation. Experimental results on a wide range of Chinese NLU and NLG tasks show the effectiveness of CPT.

Overview of CPT: A Pre-Trained Unbalanced Transformer for Chinese Language Tasks

The paper introduces CPT, a Pre-trained Unbalanced Transformer specifically designed for Chinese language tasks, both in understanding (NLU) and generation (NLG). Unlike conventional pre-trained models that focus separately on either NLU or NLG, CPT is structured to leverage the shared knowledge between these tasks to enhance its performance across a range of scenarios.

Architectural Design

CPT's architecture diverges from traditional models by incorporating a shared encoder and two distinct decoders: one dedicated to understanding and the other to generation. The shared encoder captures universal semantic representations, while the task-specific decoders focus on either NLU, employing masked LLMing (MLM), or NLG, leveraging denoising auto-encoding (DAE). This configuration allows CPT to be both computationally efficient and highly adaptable, offering flexibility in fine-tuning to optimize performance on various downstream tasks.

Experimental Validation

CPT was evaluated across a broad spectrum of Chinese language tasks. The results indicate that CPT not only maintains competitiveness with existing state-of-the-art models but also achieves superior performance in several areas. For instance, CPT achieved an average accuracy of 72.4 on the test sets of the CLUE Benchmark for classification tasks, outperforming prevalent models like RoBERTa and BERT at both base and large sizes.

In sequence labeling and MRC, CPT consistently delivered higher F1 and EM scores, respectively. Particularly on datasets like CMRC and DRCD, CPT surpassed current benchmarks, demonstrating the efficacy of its dual-decoder approach.

Implications and Future Directions

The implications of CPT are significant for both practical NLP tasks and theoretical advancements in model architecture design. Practically, it reduces the overhead of maintaining separate models for understanding and generation tasks, offering a unified framework that is both cost-effective and efficient.

Theoretically, CPT paves the way for more advanced, unified models that can seamlessly handle diverse tasks. Future research might explore the expansion of this architecture to other languages and further enhancement of its adaptability and efficiency through additional pre-training strategies or model simplifications.

In conclusion, CPT represents a notable advancement in the application of pre-trained models for Chinese NLP tasks, merging the strengths of both NLU and NLG in a single, efficient framework. With its open-source availability, it holds potential for wide adoption and further exploration in natural language understanding and generation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yunfan Shao (19 papers)
  2. Zhichao Geng (7 papers)
  3. Yitao Liu (10 papers)
  4. Junqi Dai (9 papers)
  5. Hang Yan (86 papers)
  6. Fei Yang (110 papers)
  7. Li Zhe (2 papers)
  8. Hujun Bao (134 papers)
  9. Xipeng Qiu (257 papers)
Citations (140)