Beam Search Strategies for Neural Machine Translation (1702.01806v2)

Published 6 Feb 2017 in cs.CL

Abstract: The basic concept in Neural Machine Translation (NMT) is to train a large Neural Network that maximizes the translation performance on a given parallel corpus. NMT is then using a simple left-to-right beam-search decoder to generate new translations that approximately maximize the trained conditional probability. The current beam search strategy generates the target sentence word by word from left-to- right while keeping a fixed amount of active candidates at each time step. First, this simple search is less adaptive as it also expands candidates whose scores are much worse than the current best. Secondly, it does not expand hypotheses if they are not within the best scoring candidates, even if their scores are close to the best one. The latter one can be avoided by increasing the beam size until no performance improvement can be observed. While you can reach better performance, this has the draw- back of a slower decoding speed. In this paper, we concentrate on speeding up the decoder by applying a more flexible beam search strategy whose candidate size may vary at each time step depending on the candidate scores. We speed up the original decoder by up to 43% for the two language pairs German-English and Chinese-English without losing any translation quality.

Authors (2)

Markus Freitag (49 papers)
Yaser Al-Onaizan (20 papers)

Citations (343)

View on Semantic Scholar

Summary

Beam Search Strategies for Neural Machine Translation: A Review

The paper "Beam Search Strategies for Neural Machine Translation" by Freitag and Al-Onaizan addresses a notable challenge in Neural Machine Translation (NMT): optimizing the decoding process while maintaining translation quality. The authors propose enhancements to the conventional beam search algorithm, which is integral to NMT systems and primarily responsible for finding the most probable translation of a source sentence.

Context and Motivation

NMT has seen substantial advancements, often surpassing traditional statistical machine translation (SMT) methods. However, the efficiency of NMT is hindered by the beam search strategy typically employed to generate translations. The standard approach involves constructing the target sentence incrementally, maintaining a fixed number of potential candidates, or beam size, at each step. While increasing this beam size improves translation accuracy, it leads to a pronounced deceleration in decoding speed—detracting from practical applicability in real-world scenarios.

This paper seeks to enhance the efficiency of the beam search by introducing dynamic candidates and pruning techniques, which adjust the active candidate set at each time step based on candidate scores. The principal goal is to significantly speed up the translation process without sacrificing output quality.

Proposed Methodologies

The authors explore several pruning strategies as follows:

Relative Threshold Pruning: It eliminates candidates whose scores fall below a certain fractional threshold of the best candidate’s score.
Absolute Threshold Pruning: Candidates with scores trailing by a specific threshold relative to the best candidate are discarded.
Relative Local Threshold Pruning: This method evaluates the score of the most recently generated word, disregarding the cumulative score, to make pruning decisions.
Maximum Candidates per Node: Restricts the number of candidates sharing the same predecessor, promoting diversity by limiting similar partial candidates.

These techniques adjust the beam size dynamically to exclude less promising candidates, improving efficiency.

Experimental Findings

The experiments performed on German-to-English and Chinese-to-English translation tasks demonstrate promising results. Leveraging a combination of the proposed pruning techniques enhanced the decoding speed by up to 43% for certain setups, without compromising the translation quality as measured by the BLEU and TER scores.

In the German-to-English setup, the efficiency gains reached 13% at a beam size of 5 but expanded to 43% at a beam size of 14. Similar results were observed in the Chinese-to-English experiments, with a 24% speed improvement noted at a beam size of 14. These outcomes confirm that the proposed strategies effectively reduce computational burden by minimizing unnecessary candidate evaluations while maintaining the robustness of translation outcomes.

Implications and Future Directions

This research contributes to the field of machine translation by proposing enhancements to a critical component—beam search. By facilitating faster decoding, these strategies make NMT more viable for real-time applications, where translation time is as crucial as accuracy.

Future research may explore how these methods could integrate with other advancements in NMT, such as transformer-based models, or investigate applications in low-resource language pairs. Additionally, the implications of these strategies on other sequential decoding tasks, such as speech recognition or image captioning, could provide further valuable insights.

Overall, the paper underscores the potential of optimizing NMT decoding processes, paving the way for more efficient and scalable machine translation systems.

PDF Markdown

Related Papers

Find Related Papers