GLM: General LLM Pretraining with Autoregressive Blank Infilling
The paper presents a novel approach to pretraining LLMs with a General LLM (GLM) based on autoregressive blank infilling. This method aims to overcome the limitations of existing pretraining frameworks, which include autoencoding models like BERT, autoregressive models such as GPT, and encoder-decoder models like T5. None of the previous frameworks demonstrate optimal performance across a variety of NLP tasks, particularly those involving natural language understanding (NLU), unconditional generation, and conditional generation.
The proposed GLM framework leverages an autoregressive blank infilling mechanism to fill in continuous spans of missing text from an input sequence. This approach incorporates span shuffling and 2D positional encoding. By adopting these techniques, GLM aims to enhance task performance across a broad spectrum of applications, including NLU and text generation.
Key Contributions and Methodology
- Autoregressive Blank Infilling:
- GLM is designed to fill in randomly masked continuous spans of text in an input sequence.
- It uses an autoregressive approach, where the model predicts one span at a time sequentially, while also encoding 2D positional information to keep track of both inter- and intra-span positions.
- Multi-Task Pretraining:
- Offers versatility by pretraining on various tasks with different lengths and numbers of masked spans.
- Document-level and sentence-level objectives are introduced to pretrain for both long text generation and sequence-to-sequence tasks.
- 2D Positional Encoding:
- Each token is assigned two positional IDs to capture its position in the corrupted text and its position within the masked span.
- This positional encoding approach ensures that the model can effectively handle variable-length spans without prior knowledge of their lengths.
- Unified Encoder and Decoder:
- GLM employs a single Transformer model that functions as both a bidirectional encoder and a unidirectional decoder, optimized for the blank infilling task.
Empirical Results
The paper provides extensive experimental results demonstrating GLM's performance on several tasks. The most notable results include:
- SuperGLUE Benchmark:
- GLMBase outperforms BERTBase by 4.6% on average.
- GLMLarge surpasses BERTLarge by 5.0%, indicating significant gains in NLU tasks.
- The model also shows competitive performance compared to state-of-the-art models like T5, BART, and RoBERTa, achieving superior results with fewer parameters.
- Sequence-to-Sequence Tasks:
- In tasks such as Gigaword summarization and SQuAD question generation, GLM achieves competitive to superior performance compared to existing models like MASS and UniLM.
- Text Infilling:
- On the Yahoo Answers dataset, GLM significantly outperforms prior methods such as BERT and BLM, showcasing its effectiveness in generating coherent text spans.
- LLMing:
- GLM demonstrates robust performance in zero-shot perplexity and accuracy on the LAMBADA dataset, further validating its capacity for long-range dependency modeling.
Implications and Future Developments
The introduction of GLM has several important implications:
- Unified Pretraining Framework: By effectively combining autoencoding and autoregressive objectives, GLM presents a versatile solution that can be pretrained once and applied across various downstream tasks, simplifying the model deployment process.
- Enhanced Flexibility: The use of 2D positional encoding and span shuffling provides GLM with the capability to handle arbitrary lengths and numbers of masked spans, which is crucial for practical applications involving variable-length text input and output.
- Scalability: GLM demonstrates that it can achieve high performance even with fewer parameters compared to models like T5, highlighting its efficiency and scalability.
Looking ahead, future research could explore further fine-tuning techniques and additional pretraining objectives to enhance model performance. GLM's autoregressive blank infilling mechanism also opens doors to potential improvements in areas like interactive text generation and dynamic knowledge integration.
By demonstrating significant improvements across multiple benchmarks and tasks, the GLM framework represents a noteworthy advancement in the development of general pretraining methods for LLMs. It sets a precedent for future models to integrate the strengths of different pretraining objectives into a single unified framework.