SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation (1808.07512v2)

Published 22 Aug 2018 in cs.CL

Abstract: In this work, we examine methods for data augmentation for text-based tasks such as neural machine translation (NMT). We formulate the design of a data augmentation policy with desirable properties as an optimization problem, and derive a generic analytic solution. This solution not only subsumes some existing augmentation schemes, but also leads to an extremely simple data augmentation strategy for NMT: randomly replacing words in both the source sentence and the target sentence with other random words from their corresponding vocabularies. We name this method SwitchOut. Experiments on three translation datasets of different scales show that SwitchOut yields consistent improvements of about 0.5 BLEU, achieving better or comparable performances to strong alternatives such as word dropout (Sennrich et al., 2016a). Code to implement this method is included in the appendix.

PDF Abstract

SwitchOut: An Efficient Data Augmentation Algorithm for Neural Machine Translation

The paper "SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation" introduces a novel data augmentation algorithm specifically tailored for neural machine translation (NMT). The proposed method, SwitchOut, is grounded in the formulation of an optimization problem aimed at designing data augmentation policies that maximize desired properties like smoothness and diversity.

Key Contributions and Methodology

The primary contribution of this work is the derivation of a simple yet effective data augmentation strategy for NMT, which randomly replaces words in both the source and target sentences with random words drawn from their respective vocabularies. This method, named SwitchOut, is introduced as a generalized solution encompassing existing augmentation schemes such as word dropout and Reward Augmented Maximum Likelihood (RAML).

The authors frame data augmentation as an optimization problem where the goal is to maximize an objective function comprised of two terms: a smoothness term that encourages creating augmented samples similar to the empirical data and a diversity term that facilitates a wider distribution over possible samples. The proposed solution aligns with the maximum entropy problem, which yields an analytic form allowing the realization of various augmentation policies.

SwitchOut, as proposed in the paper, involves replacing words in sentences independently sampled from a uniform distribution over their vocabularies. This is realized through an efficient sampling process, making SwitchOut simple to implement and computationally feasible.

Experimental Evaluation

The authors validate SwitchOut on multiple NMT tasks with varying dataset scales, showing consistent improvements. Specifically, on WMT 15 English-German, IWSLT 2016 German-English, and IWSLT 2015 English-Vietnamese datasets, SwitchOut demonstrated an approximate increase of 0.5 BLEU score over strong baselines without introducing significant computational overhead. The method's efficacy was highlighted by its consistent performance across datasets and its compatibility with other augmentation techniques like RAML and back-translation.

Implications and Future Work

SwitchOut exemplifies how incorporating randomness into augmentation can improve translation model robustness and performance by expanding the feature space explored during training. The increase in BLEU scores across various datasets implies practical utility in enhancing NMT models' ability to generalize, particularly in low-resource settings where empirical data inadequacies are more pronounced.

The paper opens avenues for future research in exploring further data augmentation policies guided by the outlined optimization framework, potentially incorporating additional linguistic insights or domain-specific knowledge. Given the simplicity and adaptability of the SwitchOut method, it holds promise for broader applications beyond NMT, including other natural language processing tasks that may benefit from expanded representative data spaces.

In summary, this work successfully demonstrates a novel, efficient data augmentation technique that offers measurable benefits for neural machine translation and potentially provides a foundation for future augmentation strategies within the broader field of AI.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Xinyi Wang (152 papers)
Hieu Pham (35 papers)
Zihang Dai (27 papers)
Graham Neubig (342 papers)

Citations (187)

View on Semantic Scholar

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation (1808.07512v2)