COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining (2102.08473v2)

Published 16 Feb 2021 in cs.CL and cs.LG

Abstract: We present a self-supervised learning framework, COCO-LM, that pretrains LLMs by COrrecting and COntrasting corrupted text sequences. Following ELECTRA-style pretraining, COCO-LM employs an auxiliary LLM to corrupt text sequences, upon which it constructs two new tasks for pretraining the main model. The first token-level task, Corrective LLMing, is to detect and correct tokens replaced by the auxiliary model, in order to better capture token-level semantics. The second sequence-level task, Sequence Contrastive Learning, is to align text sequences originated from the same source input while ensuring uniformity in the representation space. Experiments on GLUE and SQuAD demonstrate that COCO-LM not only outperforms recent state-of-the-art pretrained models in accuracy, but also improves pretraining efficiency. It achieves the MNLI accuracy of ELECTRA with 50% of its pretraining GPU hours. With the same pretraining steps of standard base/large-sized models, COCO-LM outperforms the previous best models by 1+ GLUE average points.

PDF Abstract

Analysis of COCO-LM: Correcting and Contrasting Text Sequences for LLM Pretraining

The paper "COCO-LM: Correcting and Contrasting Text Sequences for LLM Pretraining" introduces an innovative framework for enhancing the effectiveness and efficiency of pretrained LLMs (PLMs). The authors propose a self-supervised learning approach that leverages two critical tasks: Corrective LLMing (CLM) and Sequence Contrastive Learning (SCL). By integrating these tasks, COCO-LM achieves superior performance on prominent NLP benchmarks such as GLUE and SQuAD, while also improving pretraining efficiency.

Novel Framework for LLM Pretraining

COCO-LM builds upon the ELECTRA framework, wherein an auxiliary model creates corrupted text sequences. However, unlike ELECTRA, which focuses solely on a binary classification task for detecting replaced tokens, COCO-LM employs CLM to both detect and correct these tokens. This task is designed to refine token-level semantics and provides LLMing capabilities absent in ELECTRA, which can limit its applicability in tasks such as prompt-based learning.

The second task, SCL, addresses the issue of anisotropic representations by aligning positive data pairs and contrasting them against negatives. Through the integration of these two tasks, COCO-LM effectively trains the main Transformer model to produce more discriminative and informative sequence representations.

Experimental Evidence and Efficiency Analysis

The authors conduct extensive experiments on GLUE and SQuAD datasets, yielding noteworthy advancements in both task accuracy and resource utilization. For instance, COCO-LM attains a GLUE score improvement of over one average point compared to the best prior models within the same pretraining budget. Furthermore, COCO-LM accomplishes the performance of the MNLI benchmark using only 50-60% of the GPU hours required by comparable models like RoBERTa and ELECTRA.

Theoretical and Practical Implications

Theoretically, COCO-LM highlights the potential of combining correction and contrastive learning paradigms to enhance LLM robustness. The corrective mechanism helps the model capture intricate token-level details, while the contrastive task promotes a healthier representation space by ensuring uniformity—a characteristic with notable implications for transfer learning and generalization capabilities.

Practically, this framework provides a pathway for more computationally efficient large-scale model training. The reduced need for extensive computational resources could democratize access to top-tier PLMs, making it feasible for a broader range of researchers and developers to exploit state-of-the-art NLP technologies in various applications.

Future Directions

Future research could explore alternative data augmentation techniques for constructing contrastive pairs beyond the cropping and masked replacements utilized here. Additionally, optimizing the auxiliary model to dynamically adjust its corruption strategy to better serve the main model's learning cycle remains an open area. Enhanced interaction between auxiliary and main models during pretraining could further advance the efficacy of this dual-model framework.

In conclusion, COCO-LM represents a significant contribution to the pretraining strategies of LLMs by innovating upon self-supervision tasks. It establishes a foundation for both theoretical exploration and practical applications, promising to propel future advancements in PLM efficiency and effectiveness.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Yu Meng (92 papers)
Chenyan Xiong (95 papers)
Payal Bajaj (13 papers)
Saurabh Tiwary (15 papers)
Paul Bennett (17 papers)
Jiawei Han (263 papers)
Xia Song (38 papers)

Citations (196)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - microsoft/COCO-LM: [NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining (117 stars)