Effects of padding on LSTMs and CNNs

Published 18 Mar 2019 in cs.LG, cs.CL, cs.IR, and stat.ML | (1903.07288v1)

Abstract: Long Short-Term Memory (LSTM) Networks and Convolutional Neural Networks (CNN) have become very common and are used in many fields as they were effective in solving many problems where the general neural networks were inefficient. They were applied to various problems mostly related to images and sequences. Since LSTMs and CNNs take inputs of the same length and dimension, input images and sequences are padded to maximum length while testing and training. This padding can affect the way the networks function and can make a great deal when it comes to performance and accuracies. This paper studies this and suggests the best way to pad an input sequence. This paper uses a simple sentiment analysis task for this purpose. We use the same dataset on both the networks with various padding to show the difference. This paper also discusses some preprocessing techniques done on the data to ensure effective analysis of the data.

Abstract PDF Upgrade to Chat

Citations (86)

View on Semantic Scholar

Summary

The paper demonstrates that for LSTMs, pre-padding achieves approximately 80% accuracy versus about 50% with post-padding.
It shows that CNN performance remains largely consistent around 75% accuracy irrespective of the padding order.
The study underscores the need to tailor padding strategies to specific neural network architectures for optimal sequence processing.

Effects of Padding on LSTMs and CNNs

The paper "Effects of Padding on LSTMs and CNNs" embarks on an analytical investigation of the implications that padding has on the performance of Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs). By focusing on a sentiment analysis task, the researchers aim to elucidate how different padding schemes affect model accuracy and computational efficiency when processing sequential data.

Overview and Methodology

LSTM networks and CNNs have entrenched themselves as preferred architectures for handling sequential data, yet both require input sequences of uniform length. This constraint necessitates the application of padding, where sequences are extended or truncated to match a predefined length. Padding can be applied either by adding zeroes before (pre-padding) or after (post-padding) the actual data. The paper employs both padding strategies to assess their performance impacts on LSTM and CNN models in a sentiment classification task using Twitter data.

The study involves preprocessing the data by converting tweets into word vectors using the Word2Vec skipgram model. The researchers trained both LSTM and CNN models on this vectorized dataset, padding the tweets to a maximum sequence length of 93.

Experimental Results and Observations

The results reveal a significant divergence between the performance of pre-padding and post-padding in LSTMs. For the LSTM architecture, pre-padding demonstrates a notable improvement in both training and test accuracy, with pre-padding models reaching an accuracy of approximately 80%, compared to around 50% for post-padding models. This suggests that the temporal nature of LSTMs is heavily impacted by the sequence of data input, whereby pre-padding preserves the sequential integrity of the input data better than post-padding.

In contrast, CNNs exhibit marginal differences between pre-padding and post-padding, reflecting their inherent design to capture patterns rather than sequence dependencies. Both padding types yielded similar accuracy levels around 75% in the sentiment analysis task for CNNs.

The paper attributes these results to the architectural differences between LSTMs—which depend on memory and sequential order—and CNNs, which focus on spatial hierarchy and local patterns.

Implications and Future Directions

This research underscores the pivotal role of data preprocessing in neural network model design, specifically within the context of sequence processing tasks. The marked difference in results for LSTMs highlights the necessity of choosing appropriate padding schemes to optimize model performance.

Given the findings, future work could explore adaptive padding techniques that dynamically adjust to sequence characteristics within the dataset, potentially enhancing model robustness across diverse applications. Moreover, the implications of these findings extend beyond sentiment analysis, suggesting broader applicability to other tasks involving temporal or sequential data such as machine translation, speech recognition, and bioinformatics.

In summary, the study provides valuable insights into the preprocessing of sequential data, with clear evidence indicating that tailoring padding strategies to the underlying neural network architecture is crucial for achieving optimal performance in machine learning models.

Markdown