Improved Variational Autoencoders for Text Modeling using Dilated Convolutions (1702.08139v2)

Published 27 Feb 2017 in cs.NE, cs.CL, and cs.LG

Abstract: Recent work on generative modeling of text has found that variational auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM LLMs (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoder's dilation architecture, we control the effective context from previously generated words. In experiments, we find that there is a trade off between the contextual capacity of the decoder and the amount of encoding information used. We show that with the right decoder, VAE can outperform LSTM LLMs. We demonstrate perplexity gains on two datasets, representing the first positive experimental result on the use VAE for generative modeling of text. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines.

Authors (4)

Zichao Yang (27 papers)
Zhiting Hu (75 papers)
Ruslan Salakhutdinov (248 papers)
Taylor Berg-Kirkpatrick (106 papers)

Citations (376)

View on Semantic Scholar

Summary

Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

This paper addresses the challenges associated with applying Variational Autoencoders (VAEs) to textual data, specifically in the context of LLMing and semi-supervised learning tasks. The authors propose the use of dilated convolutional neural networks (CNNs) as an alternative to long short-term memory (LSTM) networks typically employed as decoders in VAEs. The motivation stems from the observation that LSTM decoders often fail to utilize the latent representations effectively, leading to model collapse into simple LLMs with limited perplexity performance.

Key Contributions

Dilated CNN as Decoder: The core contribution is the exploration of dilated CNN architectures for the VAE decoder. This architecture provides flexibility in controlling the amount of contextual information utilized during text generation. The ability to vary dilation rates and network depth allows the model to balance between leveraging contextual capacity and relying on latent space representations.
Empirical Results: The research demonstrates that with appropriate calibration, VAEs with dilated CNN decoders outperform standard LSTM LLMs in perplexity across different data sets. The paper reports perplexity gains on both Yahoo Answer and Yelp15 review datasets, marking notable improvements in LLMing tasks when employing this strategy.
Semi-supervised and Unsupervised Learning Improvements: The paper also extends the dilated CNN VAE framework to semi-supervised and unsupervised learning settings, achieving superior classification accuracy against existing baselines. Notably, the semi-supervised VAE model outperformed traditional LSTM-based methods in scenarios with limited labeled data, showcasing its robustness in low-resource settings.

Numerical Results and Evaluation

The paper presents comprehensive evaluations, emphasizing perplexity and negative log likelihood (NLL) as primary metrics. For LLMing tasks, the models equipped with CNN decoders demonstrated significant improvements in both perplexity and NLL when compared against LSTM-based VAEs. These enhancements were attributed to the effective trade-off between contextual capacity and reliance on latent variables, made feasible by the flexibility of CNN architectures.

In semi-supervised learning settings, the model's performance was superior with fewer labeled samples, highlighting its capability to leverage unlabeled data effectively. For instance, the semi-supervised VAE configurations achieved notable gains in classification accuracy over models initialized with LLMs or sequence autoencoders, particularly in the smaller labeled sample regimes.

Implications and Future Directions

The research holds substantial implications for the design of generative models in NLP. The use of dilated CNNs as VAE decoders not only enhances the LLMing capabilities but also improves the model's aptitude in semi-supervised and unsupervised scenarios. This finding suggests a broader potential for variational frameworks in capturing complex text properties, which could be pivotal for advancing unsupervised learning approaches in NLP.

Future research might focus on extending this architecture to other modalities beyond text, exploring the generalization of these findings. Additionally, investigating hybrid models that integrate more sophisticated priors or hierarchical frameworks could further enhance the expressivity and efficiency of VAEs in diverse applications. Enhanced understanding of the interplay between encoder, decoder architectures, and training regimes could unlock further advancements in the field of generative modeling.

PDF Markdown