Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding (1709.04696v3)

Published 14 Sep 2017 in cs.CL and cs.AI

Abstract: Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, "Directional Self-Attention Network (DiSAN)", is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.

Citations (728)

Summary

  • The paper presents multi-dimensional and directional self-attention to capture feature-wise scores and temporal dependencies in sentence encoding.
  • It eliminates RNN and CNN architectures with a lightweight design that significantly reduces computational cost and training time.
  • Experimental results on SNLI, SST, and MultiNLI highlight improved accuracy and efficiency over traditional models.

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

The paper "DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding" by Tao Shen et al. presents a novel approach to sentence encoding tasks in NLP, leveraging self-attention mechanisms to eschew traditional Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN). This work introduces two main innovations, multi-dimensional attention and directional self-attention, to enhance the flexibility and efficiency of neural networks in capturing dependencies and modeling temporal information.

Multi-Dimensional and Directional Self-Attention

The proposed multi-dimensional attention goes beyond traditional attention mechanisms by computing feature-wise scores rather than a single scalar alignment score. This allows each feature of an input token to be weighted independently, leading to improved contextual understanding.

Directional self-attention (DiSA), on the other hand, addresses the challenge of encoding temporal order within attention mechanisms. By leveraging positional masks, DiSA can model the directional dependencies between tokens in a sequence, distinguishing it from conventional self-attention which often loses temporal order information. The forward and backward masks capture dependencies in both directions, enhancing the model's contextual representation.

DiSAN Architecture and Its Efficacy

The architecture of the Directional Self-Attention Network (DiSAN) combines these attention mechanisms into a lightweight, RNN/CNN-free structure. The network first applies forward and backward directional self-attention layers to the input sequence, thereafter employing multi-dimensional source-to-token self-attention to produce a final sentence encoding. This structure avoids the complexity and sequential processing bottlenecks of RNNs and CNNs, making DiSAN highly efficient in terms of both parameter count and computational speed.

Experimental Results

DiSAN demonstrates state-of-the-art performance on a range of NLP benchmarks, including Stanford Natural Language Inference (SNLI), the Stanford Sentiment Treebank (SST), and MultiNLI among others. On the SNLI dataset, DiSAN improved the best existing model's accuracy by 1.02%, achieving a test accuracy of 85.62% while maintaining a significantly lower computational cost compared to traditional RNN/CNN models.

Similarly, in sentiment analysis on the SST dataset, DiSAN outperformed other models with a test accuracy of 51.72%, which is 0.52% higher than the best previous result. Further tests on other datasets like MultiNLI, SICK, and various sentence classification tasks reaffirm DiSAN's superior performance and efficiency.

Implications and Future Work

DiSAN's architecture showcases that attention mechanisms, when enhanced with multi-dimensional and directional capabilities, can supplant traditional RNNs and CNNs effectively. This method not only simplifies the model architecture but also makes it more amenable to parallelization, significantly reducing training times.

The practical implications of this research are noteworthy. Reduced training times and fewer parameters lower the barriers to deploying complex NLP models in production environments. Moreover, the enhanced ability to model dependencies precisely and efficiently positions DiSAN as a versatile tool for a wide array of NLP tasks.

Future research could expand on this work by exploring its integration into more complex systems such as those used in question answering and reading comprehension. Additionally, leveraging multi-dimensional and directional attention mechanisms could catalyze advancements in other domains requiring sophisticated sequence modeling and understanding.

In conclusion, the paper establishes DiSAN as a pivotal step towards more efficient and effective NLP models, underscoring the potential of attention mechanisms to redefine neural network architectures.

X Twitter Logo Streamline Icon: https://streamlinehq.com