Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information (1805.11360v2)

Published 29 May 2018 in cs.CL

Abstract: Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering. For these tasks, understanding logical and semantic relationship between two sentences is required but it is yet challenging. Although attention mechanism is useful to capture the semantic relationship and to properly align the elements of two sentences, previous methods of attention mechanism simply use a summation operation which does not retain original features enough. Inspired by DenseNet, a densely connected convolutional network, we propose a densely-connected co-attentive recurrent neural network, each layer of which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers. It enables preserving the original and the co-attentive feature information from the bottommost word embedding layer to the uppermost recurrent layer. To alleviate the problem of an ever-increasing size of feature vectors due to dense concatenation operations, we also propose to use an autoencoder after dense concatenation. We evaluate our proposed architecture on highly competitive benchmark datasets related to sentence matching. Experimental results show that our architecture, which retains recurrent and attentive features, achieves state-of-the-art performances for most of the tasks.

View on arXiv

Authors (3)

Seonhoon Kim (6 papers)
Inho Kang (7 papers)
Nojun Kwak (116 papers)

Citations (215)

View on Semantic Scholar

Summary

Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information

The paper presents a novel approach to semantic sentence matching in NLP through the introduction of a Densely-connected Recurrent and Co-attentive neural Network (DRCN). This architecture aims to advance tasks such as natural language inference, paraphrase identification, and question answering by improving the ability to comprehend the logical and semantic relationships between sentences.

The proposed architecture is inspired by DenseNet, a densely connected convolutional network, and is characterized by its use of densely-connected layers instead of the traditional summation operation in attention methods, which potentially dilutes key feature information. In DRCN, each layer of this neural network uses the concatenated output of both attentive features and recurrent hidden features aggregated from all preceding layers, preserving crucial information throughout the network hierarchy. This approach contrasts with traditional models that lose significant feature details due to the simplicity of their attention mechanisms.

To address the increase in feature vector sizes produced by dense connectivity, the researchers incorporated an autoencoder, which serves as a bottleneck, reducing dimensionality while retaining informational integrity. This modification provides a practical solution to manage complexity within deeper models without sacrificing performance.

The DRCN was rigorously evaluated across multiple challenging datasets, namely SNLI, MultiNLI, Quora question pairs, TrecQA, and SelQA, demonstrating competitive or superior state-of-the-art performance in most tasks. Notably, the model achieved an accuracy of 88.9% on the SNLI test set, and 90.1% with ensemble methods, indicating its capability in semantic representation.

The implications of this research are significant both practically and theoretically. The DRCN’s architecture allows for improved semantic understanding in sentence matching tasks without the need for external knowledge bases, thus offering a more efficient and scalable solution for NLP applications. The work opens further avenues for exploring deeper recurrent models and their stability when combined with attention mechanisms, which has long been a challenging endeavor due to the vanishing gradient problem.

Future developments may consider incorporating more advanced autoencoding techniques or diversification in attention mechanisms, potentially enhancing model efficiency and efficacy. Additionally, while DRCN already achieves substantial results without relying on contextual embeddings, integrating such methodologies could further hone its performance for complex, multi-layered language understanding tasks.

In conclusion, the DRCN offers a compelling advancement in the field of semantic sentence matching, contributing both novel methodology and robust performance results, and paves the way for future investigation to refine and expand upon this foundational architecture.

PDF Markdown

Related Papers

Find Related Papers