Discrete Flows: Invertible Generative Models of Discrete Data (1905.10347v1)

Published 24 May 2019 in cs.LG and stat.ML

Abstract: While normalizing flows have led to significant advances in modeling high-dimensional continuous distributions, their applicability to discrete distributions remains unknown. In this paper, we show that flows can in fact be extended to discrete events---and under a simple change-of-variables formula not requiring log-determinant-Jacobian computations. Discrete flows have numerous applications. We consider two flow architectures: discrete autoregressive flows that enable bidirectionality, allowing, for example, tokens in text to depend on both left-to-right and right-to-left contexts in an exact LLM; and discrete bipartite flows that enable efficient non-autoregressive generation as in RealNVP. Empirically, we find that discrete autoregressive flows outperform autoregressive baselines on synthetic discrete distributions, an addition task, and Potts models; and bipartite flows can obtain competitive performance with autoregressive baselines on character-level LLMing for Penn Tree Bank and text8.

Citations (112)

View on Semantic Scholar

Summary

The paper extends normalizing flows to discrete data by introducing a change-of-variables formula that bypasses log-determinant computations.
It proposes discrete autoregressive flows for bidirectional context and discrete bipartite flows for efficient non-autoregressive modeling.
Empirical evaluations show superior performance on synthetic tasks and character-level language modeling with significantly faster generation times.

Analyzing Discrete Flows: Invertible Generative Models of Discrete Data

The paper "Discrete Flows: Invertible Generative Models of Discrete Data" by Dustin Tran et al., investigates a novel approach to modeling discrete data using discrete flows. Traditionally, normalizing flows have been a potent tool for continuous data due to their ability to construct complex high-dimensional distributions. However, this approach has been relatively unexplored in the domain of discrete distributions, which is the focal point of this paper.

Key Contributions

Extension of Normalizing Flows to Discrete Data: This paper successfully extends the concept of normalizing flows to discrete events. The authors introduce a change-of-variables formula tailored for discrete data that does not require the computation of the log-determinant-Jacobian, which is a cornerstone in continuous flows. This advancement positions discrete flows as a principal mechanism for handling discrete data sequences with exact likelihoods and bidirectional dependencies.
Proposed Architectures: Two major architectures are introduced:
- Discrete Autoregressive Flows: These flows allow bidirectional context modeling. The generative model can consider token dependencies in both left-to-right and right-to-left directions, thus offering a significant advantage over traditional unidirectional autoregressive models.
- Discrete Bipartite Flows: Similar to the RealNVP model in continuous data, discrete bipartite flows facilitate efficient, non-autoregressive generative processes. These flows retain the tractability of exact likelihoods while bypassing the restrictions of sequential token generation.
Empirical Results and Evaluations: The paper presents empirical evaluations that demonstrate the superiority of these models over autoregressive baselines in both synthetic and real-world tasks. Specifically:
- Synthetic Tasks: Discrete autoregressive flows consistently outperform baseline models across a range of tasks, such as a discretized mixture of Gaussians, Potts models, and addition tasks involving discrete numeric sequences.
- Character-Level LLMing: In character-level LLMing tasks on datasets like Penn Tree Bank and text8, discrete bipartite flows achieve competitive performance with significantly faster generation times than autoregressive models.

Implications and Future Directions

The implications of this research span both practical applications and theoretical explorations. Practically, discrete flows can enhance systems requiring rapid generation of sequences with discrete outputs, such as NLP applications. They serve as viable candidates for models that require bidirectional context, which is vital for tasks like translation and speech processing where context from entire sequences can improve output quality.

Theoretically, the results encourage further exploration into optimizing discrete flows for large-vocabulary tasks. As suggested in the paper, additional research could focus on enhancing the training methodology, such as utilizing alternative gradient estimators, to improve efficiency for word-level tasks in NLP where the vocabulary sizes are extensive.

Another future research direction could involve the application of more sophisticated invertible functions from fields such as cryptography, which may offer more robust discrete transformations.

Conclusion

This paper elucidates a pivotal step forward in the use of invertible generative models for discrete data. By overcoming the limitations posed by traditional autoregressive models, the discrete flows presented offer flexible, efficient, and powerful alternatives for modeling discrete data. The work marks a significant step in evolving normalizing flows from solely continuous applications to potent tools in discrete generative modeling. The insights and methodologies articulated in this paper could pioneer advanced generative models, shaping future developments in machine learning and artificial intelligence.

PDF Markdown

Related Papers

YouTube

Show All Videos