Language as a Latent Variable: Discrete Generative Models for Sentence Compression (1609.07317v2)

Published 23 Sep 2016 in cs.CL and cs.AI

Abstract: In this work we explore deep generative models of text in which the latent representation of a document is itself drawn from a discrete LLM distribution. We formulate a variational auto-encoder for inference in this model and apply it to the task of compressing sentences. In this application the generative model first draws a latent summary sentence from a background LLM, and then subsequently draws the observed sentence conditioned on this latent summary. In our empirical evaluation we show that generative formulations of both abstractive and extractive compression yield state-of-the-art results when trained on a large amount of supervised data. Further, we explore semi-supervised compression scenarios where we show that it is possible to achieve performance competitive with previously proposed supervised models while training on a fraction of the supervised data.

Citations (221)

View on Semantic Scholar

Summary

The paper presents a novel VAE-based approach that integrates a pointer network for effective sentence compression.
The methodology leverages both supervised and semi-supervised training through ASC and FSC models to reduce reliance on extensive labeled data.
Empirical results on the Gigaword dataset demonstrate state-of-the-art performance, with unlabelled data enhancing precision and F-1 scores.

Exploring Discrete Generative Models for Sentence Compression

The paper "Language as a Latent Variable: Discrete Generative Models for Sentence Compression" presents an innovative approach to sentence compression by leveraging deep generative models of text with a discrete latent variable framework. The authors propose a novel application of variational auto-encoders (VAEs), tailored to a scenario where the latent representations are extracted from a discrete LLM distribution, thereby focusing on generating compressed sentences.

Model Architecture and Methodology

The proposed architecture revolves around the auto-encoding sentence compression (ASC) model. This model comprises several recurrent neural networks—an encoder, a compressor, a decoder, and a LLM. The encoder processes the source sentence to extract features, the compressor generates a variational representation of potential compressions, and the decoder reconstructs the original sentence, constrained by the compressed latent representation.

A fundamental innovation in the paper is the application of a pointer network within the compression model. This choice efficiently reduces the search space in the sampling process by limiting the candidate outputs to those present in the input sentence, thus easing the optimization of the variational lower bound via the REINFORCE algorithm to minimize high variance issues typically arising in sampling-based variational inference.

Additionally, the paper employs a forced-attention sentence compression (FSC) model for supervised learning. This model bridges the gap between extractive and abstractive summarization approaches by training on a mixture of labelled and unlabelled data. By integrating the FSC model with the ASC’s pointer network, the authors achieve a learning synergy where the combined model benefits from robust gradient signals under both settings.

Empirical Evaluation and Results

The empirical evaluation showcases the model's performance on the Gigaword sentence compression dataset. The results indicate that both the ASC and the combined ASC+FSC models achieve state-of-the-art results under supervised conditions and manage to approximate these results even in semi-supervised settings with reduced labelled data. Specifically, models trained on a small fraction of the labelled dataset attain performance levels akin to models dependent entirely on vast amounts of labelled data. This highlights the approach’s efficiency in minimizing the cost and effort associated with acquiring extensive training annotations.

A critical observation is the qualitative effect of unlabelled data on the extractive summarization performance. Incorporating unlabelled data notably enhances the precision scores, indirectly leading to improved F-1 metrics. When tested on abstractive summarization tasks, the ASC+FSC framework outperformed existing benchmarks on the dataset, underscoring the viability of leveraging unlabelled data efficiently through the proposed generative-discriminative training regime.

Theoretical Implications and Future Directions

Conceptually, this work solidifies the applicability of VAEs in hybrid supervised/unsupervised scenarios in natural language processing tasks. The exploration of language as a latent variable suggests potential avenues for extending the VAE framework into other domains within NLP, such as machine translation or sentiment analysis, where modelling latent semantic transformations can offer fresh insights.

Future research could delve into exploiting even larger-scale unlabelled corpora, potentially incorporating self-supervised learning techniques to further harness the power of unlabelled data. Additionally, exploring the role of the KL divergence scaling factor in regulating training dynamics offers a mathematical challenge and a practical opportunity to refine the balance between inference flexibility and model regularization.

In summary, the paper presents a significant step forward in leveraging deep generative models for sentence compression, effectively demonstrating the power of a combined approach of supervised and unsupervised learning, and paving the way for scalable, data-efficient solutions in the broader field of NLP.

PDF Markdown