Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

Published 14 Mar 2019 in cs.LG and stat.ML | (1903.06059v2)

Abstract: The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. We show how to implicitly apply this 'Gumbel-Top-$k$' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in $k$ and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct low-variance estimators for expected sentence-level BLEU score and model entropy.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (188)

View on Semantic Scholar

Summary

The paper introduces stochastic beam search applying the Gumbel-Top-k trick to sample sequences without replacement, enabling diverse output generation.
It demonstrates improved neural translation performance by increasing output diversity while maintaining quality, as evidenced by enhanced BLEU scores.
The method reduces variance in sentence-level evaluations, providing a statistically precise approach for sequence sampling in complex models.

Stochastic Beam Search with the Gumbel-Top-k Trick

Introduction

The application of the Gumbel-Max trick, traditionally used for sampling from categorical distributions, extends naturally to selecting multiple elements without replacement via the Gumbel-Top- $k$ method. This paper explores the application of this technique to factorized distributions over sequences. By leveraging this stochastic beam search mechanism, sequences can be sampled without replacement, providing an efficient method for generating diverse outputs in tasks such as neural translation and image captioning. The proposed method connects sampling and deterministic beam search in novel ways, offering advantages over existing sampling techniques.

Preliminaries

Categorical and Gumbel Distributions

A categorical distribution defines probabilistic outcomes across a discrete set, while the Gumbel distribution offers a reparameterizable approach using noise perturbations. These techniques underpin methods such as the Gumbel-Max trick, which samples the mode of categorical distributions by adding Gumbel noise to log-probabilities.

Gumbel-Top-k Trick

Extending the Gumbel-Max trick to the "Top- $k$ " setup allows sampling of multiple elements without replacement. Key to this approach is not requiring instantiation of all sequences within a potentially vast domain. This is achieved through a stochastic beam search that maintains computational efficiency linear to the number of samples and sequence length.

Methodology

Sequence Models

Sequence models, prevalent in machine learning tasks, utilize parametric distributions to predict the probability of token sequences. These models can suffer from low variability if sequences are sampled with replacement, thus motivating the search for alternative sampling methods.

Stochastic Beam Search

Combining concepts from beam search and the Gumbel-Top- $k$ trick, stochastic beam search algorithmically finds top- $k$ sequences by systematically expanding only those nodes in a search tree that hold promise based on their perturbed log-probabilities. This approach is computationally more feasible compared to exploring the entire tree and avoids duplicate samples, facilitating broader diversity among selected sequences.

Experimental Results

Application in Neural Translation

Applying stochastic beam search to neural translation tasks demonstrates the method's ability to increase diversity of translations without compromising average quality, as evidenced by improvements in BLEU scores across varying diversity settings. Comparisons to traditional beam search and sampling methods reveal superior outcomes in sequence variability and quality.

Estimation of Sentence-Level Metrics

The method offers practical advantages in estimating sentence-level metrics such as BLEU score and model entropy. By reducing variance in estimation processes, stochastic beam search improves upon conventional sampling methods, assisting in scenario-based estimation and model training strategies that seek statistical precision.

Implications

This paper proposes stochastic beam search as a viable alternative to established sampling and beam search methods in sequence modeling tasks. By balancing diversity and statistical estimation requirements, the technique shows promise in practical AI applications needing representative sequence sets, such as translation and captioning. Additionally, theoretical implications suggest further exploration of statistical learning through the probabilistic interpretation of beam search.

Conclusion

The stochastic beam search method marries the benefits of sampling and deterministic search techniques, demonstrating effectiveness in practical applications while facilitating robust statistical estimations. Future work may explore the extension of its probabilistic framework in distributed and adaptive learning environments, capitalizing on the method's capacity for producing diverse, high-quality sequence selections.

Markdown Report Issue