Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CPM-2: Large-scale Cost-effective Pre-trained Language Models (2106.10715v3)

Published 20 Jun 2021 in cs.CL

Abstract: In recent years, the size of pre-trained LLMs (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we compare CPM-2 with mT5 on downstream tasks. Experimental results show that CPM-2 has excellent general language intelligence. Moreover, we validate the efficiency of InfMoE when conducting inference of large-scale models having tens of billions of parameters on a single GPU. All source code and model parameters are available at https://github.com/TsinghuaAI/CPM.

Overview of CPM-2: Large-scale Cost-effective Pre-trained LLMs

The research paper titled "CPM-2: Large-scale Cost-effective Pre-trained LLMs" delineates three innovative methodologies to address efficiency challenges associated with large-scale pre-trained LLMs (PLMs). These challenges are particularly pertinent given the escalating model sizes that often render them financially and computationally prohibitive for broad use. The paper presents CPM-2, both a novel pre-training pipeline and a series of optimizations across the stages of pre-training, fine-tuning, and inference aimed at enhancing cost-effectiveness without compromising performance.

Contributions and Techniques

  1. Knowledge Inheritance: The researchers introduce a technique termed 'knowledge inheritance,' which leverages the learning from previously trained PLMs to speed up the pre-training of new models. This technique exploits existing knowledge encapsulations within PLMs, circumventing the need to pre-train large models entirely from scratch. This method markedly reduces the computational costs typically associated with the pre-training phase.
  2. Prompt Tuning: The paper explores prompt tuning, a method that significantly reduces the number of task-specific parameters necessary for fine-tuning large-scale PLMs. As opposed to conventional fine-tuning approaches, prompt tuning facilitates the saving of only the embeddings of prompt tokens, potentially reducing storage requirements to 0.01% of the total model parameters. Experimentation on the CPM-2 models demonstrates the potential of prompt tuning in achieving competitive results with far less computational resource usage.
  3. InfMoE Toolkit: An innovative aspect of this research is the implementation of InfMoE, a toolkit designed for efficient inference within resource constraints. InfMoE allows for executing inference of models with tens of billions of parameters on a singular GPU by adopting a dynamically-scheduled offloading strategy. This approach mitigates the often prohibitive hardware requirements otherwise limiting the deployment of large PLMs.

Experimental Evaluation

The research evaluates CPM-2, which encompasses a bilingual model with 11 billion parameters and its more expansive MoE (Mixture-of-Experts) version with 198 billion parameters. The models are comprehensively tested against the mT5 model across various downstream tasks, validating CPM-2’s language capabilities. The results underscore CPM-2's proficiency in language understanding and generation, achieving noteworthy efficiency in tasks such as Chinese-English translation, recall, comprehension, calculation, cross-lingual tasks, summarization, classification, and text generation.

Implications and Future Directions

The innovations discussed in the paper carry significant implications for both practical deployment and theoretical exploration of PLMs. Practically, they offer pathways to significantly decrease the computational resources required for training and utilizing PLMs, expanding access to sophisticated LLMs. Theoretically, these methodologies provide new directions for research in optimizing PLM architectures, training routines, and inference strategies. Moreover, the promising results from CPM-2 suggest potential future enhancements could further extend the balance of scalability, efficiency, and effectiveness in PLMs, particularly in cross-lingual context-sensitive applications and tasks.

The methods demonstrated in this research not only bolster current capabilities but also set a foundation for the accelerated and wide-scale adoption of LLMs in various sectors. Future work could build upon this framework to continuously update large models with up-to-date data dynamically, further enhancing their utility and relevance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (19)
  1. Zhengyan Zhang (46 papers)
  2. Yuxian Gu (21 papers)
  3. Xu Han (270 papers)
  4. Shengqi Chen (8 papers)
  5. Chaojun Xiao (39 papers)
  6. Zhenbo Sun (4 papers)
  7. Yuan Yao (292 papers)
  8. Fanchao Qi (33 papers)
  9. Jian Guan (65 papers)
  10. Pei Ke (37 papers)
  11. Yanzheng Cai (1 paper)
  12. Guoyang Zeng (14 papers)
  13. Zhixing Tan (20 papers)
  14. Zhiyuan Liu (433 papers)
  15. Minlie Huang (225 papers)
  16. Wentao Han (6 papers)
  17. Yang Liu (2253 papers)
  18. Xiaoyan Zhu (54 papers)
  19. Maosong Sun (337 papers)
Citations (82)
Github Logo Streamline Icon: https://streamlinehq.com