Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
36 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
37 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
4 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation (2001.11314v3)

Published 26 Jan 2020 in cs.CL and cs.LG

Abstract: Current pre-training works in natural language generation pay little attention to the problem of exposure bias on downstream tasks. To address this issue, we propose an enhanced multi-flow sequence to sequence pre-training and fine-tuning framework named ERNIE-GEN, which bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method. To make generation closer to human writing patterns, this framework introduces a span-by-span generation flow that trains the model to predict semantically-complete spans consecutively rather than predicting word by word. Unlike existing pre-training methods, ERNIE-GEN incorporates multi-granularity target sampling to construct pre-training data, which enhances the correlation between encoder and decoder. Experimental results demonstrate that ERNIE-GEN achieves state-of-the-art results with a much smaller amount of pre-training data and parameters on a range of language generation tasks, including abstractive summarization (Gigaword and CNN/DailyMail), question generation (SQuAD), dialogue generation (Persona-Chat) and generative question answering (CoQA).

Citations (121)

Summary

  • The paper introduces an infilling and noise-aware mechanism to reduce exposure bias in natural language generation.
  • It presents a span-by-span generation flow that predicts semantically complete spans for improved text coherence.
  • The multi-flow attention architecture and multi-granularity pre-training enable state-of-the-art performance on diverse NLG tasks.

An Evaluation of ERNIE-GEN: Multi-Flow Pre-training and Fine-tuning for NLG

The paper ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation introduces a novel approach to address the exposure bias in natural language generation (NLG). This research presents ERNIE-GEN, an advanced sequence-to-sequence (seq2seq) model designed to improve generation quality by leveraging innovative pre-training and fine-tuning techniques.

Core Contributions

  1. Infilling and Noise-Aware Generation: The authors propose an infilling generation mechanism and a noise-aware generation method. Instead of relying exclusively on past word predictions, the infilling technique uses an artificial [ATTN] symbol, promoting attention to broader historical contexts and mitigating cumulative errors via teacher-forcing. Additionally, the model corrupts input sequences by replacing random words, enhancing its robustness to mistakes during inference.
  2. Span-by-Span Generation Flow: Departing from traditional word-by-word generation, ERNIE-GEN includes a span-by-span approach, enabling predictions of semantically complete spans. This method aligns more closely with human writing patterns, improving coherence in generated text.
  3. Multi-Granularity Target Fragments: The pre-training phase employs varied fragments from the input for better encoder-decoder correlation. Multi-granularity sampling accommodates both word-level and sentence-level understanding, crucial for generating longer texts effectively.
  4. Architecture Design: The introduction of the Multi-Flow Attention architecture demonstrates a significant innovation. By enabling simultaneous training on word-by-word and span-by-span flows via shared transformers, the framework enhances computational efficiency and consistency across different tasks.

Experimental Results

The empirical evaluations reveal that ERNIE-GEN achieves state-of-the-art results in multiple NLG tasks, including abstractive summarization, question generation, and dialogue response generation. Notably, it outperforms strong baselines like U{\scriptsize NI}LM and PEGASUS, demonstrating superior efficacy with smaller amounts of pre-training data. For instance, on Gigaword's full dataset, the model shows a marked improvement in ROUGE score metrics.

Practical and Theoretical Implications

The findings have immediate applications in tasks requiring coherent language production, such as automated summarization and conversational AI. By reducing exposure bias, ERNIE-GEN offers a reliable framework for enhancing seq2seq models' performance under limited training scenarios.

From a theoretical perspective, it challenges conventional sequential generation practices, advocating for a richer context and error resilience in inference stages. This paradigm shift may influence future developments in pre-trained LLMs, particularly in balancing large-scale data usage with efficient architecture design.

Future Developments

The paper provides groundwork for further exploration into pre-training strategies. Future research could delve into integrating reinforcement learning within this framework, potentially refining exposure bias management. Moreover, expanding ERNIE-GEN's applications beyond current NLG tasks could test its robustness across diverse linguistic contexts.

In conclusion, ERNIE-GEN stands out as a promising advancement, offering significant improvements on established methodologies. The robust numerical outcomes across various benchmarks underscore its potential impact on both theoretical advancements and real-world AI applications.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.