A Decomposable Attention Model for Natural Language Inference (1606.01933v2)

Published 6 Jun 2016 in cs.CL

Abstract: We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it trivially parallelizable. On the Stanford Natural Language Inference (SNLI) dataset, we obtain state-of-the-art results with almost an order of magnitude fewer parameters than previous work and without relying on any word-order information. Adding intra-sentence attention that takes a minimum amount of order into account yields further improvements.

Authors (4)

Ankur P. Parikh (28 papers)
Oscar Täckström (4 papers)
Dipanjan Das (42 papers)
Jakob Uszkoreit (23 papers)

Citations (1,341)

View on Semantic Scholar

Summary

A Decomposable Attention Model for Natural Language Inference

The paper, "A Decomposable Attention Model for Natural Language Inference," presents a novel neural architecture specifically designed for solving the natural language inference (NLI) problem. NLI, the task of determining the entailment and contradiction relationships between a premise and a hypothesis, is central to natural language understanding. Utilizing the Stanford Natural Language Inference (SNLI) dataset, the authors report state-of-the-art results with significantly fewer parameters compared to previous models.

Introduction

The authors introduce a new approach to NLI that leverages a decomposable attention model. Unlike existing models that utilize complex, deep text representation models such as CNNs or LSTMs, this model capitalizes on local substructure alignment and separately solves the decomposed subproblems. This design choice facilitates substantial parallelism and exhibits computational efficiency.

Approach

The decomposable attention model comprises three main components:

Attend: Utilizing a soft alignment mechanism, the model first aligns elements of the two input sentences.
Compare: Each aligned subphrase is then compared separately using a function, formulated as a feed-forward network.
Aggregate: The results from the compare step are aggregated and used for final classification.

The model's architecture inherently enables parallel computation across the sentence length, thereby offering considerable speedups. An additional application of intra-sentence attention increases model performance by factoring in minimal word-order information.

Empirical Results

Using the SNLI dataset as a benchmark, the proposed model outperforms various sophisticated neural architectures both in terms of accuracy and parameter efficiency. Specifically:

The vanilla model achieves test accuracy of 86.3%, using only 382K parameters.
An enhanced version incorporating intra-sentence attention attains 86.8% test accuracy with 582K parameters.

For comparison, the SPINN-PI achieves 83.2% with 3.7M parameters, and the deep attention fusion-based LSTMN achieves 86.3% with 3.4M parameters.

Computational Complexity

The complexity analysis reveals that the approach operates asymptotically similar to a vanilla LSTM encoder but with enhanced parallelism capabilities. The assumption that sentence length $\ell$ is less than embedding dimension $d$ allows the model to maintain efficiency.

Discussion and Error Analysis

Extensive evaluation indicates the model excels in scenarios where there is a high lexical overlap between the premise and hypothesis. However, the intra-sentence attention feature substantially enhances performance in cases requiring semantic compositional understanding or context-sensitive interpretations. The model struggles with certain nuanced tasks like numerical inference, sequential dependency, or single-token critical inferences.

Implications and Future Directions

The significant reduction in parameters demonstrates that simpler architectures with effective attention mechanisms can rival and even outperform complex deep learning models in specific natural language tasks.

From a theoretical standpoint, the findings prompt a re-evaluation of the necessity of deep hierarchical representations in favor of more decomposable, parallel-friendly models. Practically, this opens up avenues for deploying efficient NLI systems in resource-constrained environments.

Future work could involve exploring richer embeddings and further fine-tuning intra-sentence attention mechanisms. Additionally, extending the model to handle more diverse and harder inference tasks could help address the observed deficiencies.

In conclusion, the decomposable attention model introduces a promising direction for advancing NLI research and applications, illustrating that intelligent decomposition and attention mechanisms can render substantial improvements in model performance and computational efficiency.

PDF Markdown

Related Papers

Find Related Papers