Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets (1703.04887v4)

Published 15 Mar 2017 in cs.CL

Abstract: This paper proposes an approach for applying GANs to NMT. We build a conditional sequence generative adversarial net which comprises of two adversarial sub models, a generator and a discriminator. The generator aims to generate sentences which are hard to be discriminated from human-translated sentences (i.e., the golden target sentences), And the discriminator makes efforts to discriminate the machine-generated sentences from human-translated ones. The two sub models play a mini-max game and achieve the win-win situation when they reach a Nash Equilibrium. Additionally, the static sentence-level BLEU is utilized as the reinforced objective for the generator, which biases the generation towards high BLEU points. During training, both the dynamic discriminator and the static BLEU objective are employed to evaluate the generated sentences and feedback the evaluations to guide the learning of the generator. Experimental results show that the proposed model consistently outperforms the traditional RNNSearch and the newly emerged state-of-the-art Transformer on English-German and Chinese-English translation tasks.

Authors (4)

Zhen Yang (160 papers)
Wei Chen (1290 papers)
Feng Wang (408 papers)
Bo Xu (212 papers)

Citations (167)

View on Semantic Scholar

Summary

An Overview of "Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets"

The paper "Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets" presents a novel approach to enhance Neural Machine Translation (NMT) systems using Generative Adversarial Networks (GANs). Specifically, it introduces a framework called BLEU-reinforced Conditional Sequence Generative Adversarial Net (BR-CSGAN), which seeks to address limitations associated with traditional NMT objectives like Maximum Likelihood Estimation (MLE).

Key Contributions

The research introduces an innovative approach that diverges from typical NMT optimization methods. It employs a conditional GAN architecture that incorporates a generator and a discriminator. The generator is tasked with generating target-language sentences difficult to distinguish from human translations, while the discriminator attempts to differentiate between machine-generated and human-translated sentences. The incorporation of GANs into the sequence generation framework is a unique endeavor, which has traditionally shown success in computer vision.

A notable feature of this approach is the utilization of the smoothed sentence-level BLEU score as a reinforced objective for the generator. This biases the NMT towards generating sentences with higher BLEU scores, thus integrating both dynamic discriminative and static BLEU evaluation feedback to guide learning. This dual approach aims to balance between GAN-derived rewards and explicit BLEU optimization, potentially providing a more comprehensive reward structure than BLEU alone.

Experimental Results

The experiments conducted on English-German and Chinese-English translation tasks demonstrate that the proposed BR-CSGAN consistently outperforms previous methods, including RNNSearch and Transformer models, on standard benchmarks. Specifically, the paper reports improvements of up to +1.83 BLEU points in some scenarios when employing BR-CSGAN, underscoring the efficacy of integrating adversarial training with traditional BLEU-focused strategies.

Comparison with Minimum Risk Training

The relationship between BR-CSGAN and Minimum Risk Training (MRT) is explored, emphasizing that BR-CSGAN serves as an advanced form of MRT by dynamically adapting with GAN feedback while retaining a BLEU objective. This comparison both validates the novel approach and highlights its advantages over alternative methods. Results suggest BR-CSGAN provides a balanced optimization path with improvements in BLEU scores across evaluated datasets.

Implications and Future Directions

The BR-CSGAN framework opens theoretical and practical avenues in sequence generation, particularly for tasks like machine translation where sentence-level quality is crucial. By integrating generative adversarial training, the model not only learns from static, sentence-level BLEU feedback but also from the dynamically evolving discriminator, suggesting a potential for more natural and contextually appropriate translations.

Future research could explore multi-adversarial frameworks, incorporating multiple discriminators and generators, potentially leading to further improvements in translation models. This paper provides a foundational base for further exploration of GANs in sequence prediction tasks beyond NMT.

In summary, this paper introduces a significant development in NMT by blending adversarial learning with traditional sequence evaluation metrics, marking a step forward in achieving more accurate and human-like translations. The BR-CSGAN framework is not only adaptable to existing NMT systems but also offers a promising direction for enhancing the performance of neural architectures in language understanding and generation tasks.

Related Papers

Find Related Papers