Select-Additive Learning: Improving Generalization in Multimodal Sentiment Analysis (1609.05244v2)

Published 16 Sep 2016 in cs.CL and cs.IR

Abstract: Multimodal sentiment analysis is drawing an increasing amount of attention these days. It enables mining of opinions in video reviews which are now available aplenty on online platforms. However, multimodal sentiment analysis has only a few high-quality data sets annotated for training machine learning algorithms. These limited resources restrict the generalizability of models, where, for example, the unique characteristics of a few speakers (e.g., wearing glasses) may become a confounding factor for the sentiment classification task. In this paper, we propose a Select-Additive Learning (SAL) procedure that improves the generalizability of trained neural networks for multimodal sentiment analysis. In our experiments, we show that our SAL approach improves prediction accuracy significantly in all three modalities (verbal, acoustic, visual), as well as in their fusion. Our results show that SAL, even when trained on one dataset, achieves good generalization across two new test datasets.

Authors (4)

Haohan Wang (96 papers)
Aaksha Meghawat (2 papers)
Louis-Philippe Morency (123 papers)
Eric P. Xing (192 papers)

Citations (148)

View on Semantic Scholar

Summary

Select-Additive Learning: Improving Generalization in Multimodal Sentiment Analysis

The paper, "Select-Additive Learning: Improving Generalization in Multimodal Sentiment Analysis," addresses challenges in training machine learning models for multimodal sentiment analysis due to limited, high-quality datasets. These constraints can lead to models creating confounding factors, undermining their generalizability. To counteract this, the authors propose a novel Select-Additive Learning (SAL) procedure specifically designed to enhance the robustness of neural networks used in sentiment classification tasks across multiple modalities—verbal, acoustic, and visual.

Methodology

The proposed SAL procedure aims to mitigate the issue of confounding factors, such as speaker-specific features like wearing glasses, which can inaccurately influence sentiment predictions. The paper employs neural networks, particularly focusing on convolutional neural network (CNN) architectures, which have historically delivered the best results for multimodal sentiment analysis. SAL comprises two significant phases: Selection and Addition.

Selection Phase: Utilizes a supplementary neural network ( $h(·;\delta)$ ) to identify and isolate identity-related features in the learned representations. This identification process hinges on minimizing the difference between input representations and the output of the auxiliary network, thus accentuating the confounding dimensions.
Addition Phase: Introduces Gaussian noise to the confounding features determined in the selection phase. This noise addition helps the primary model to learn to focus on more sentiment-relevant features, thus enhancing its robustness against identity-related noise.

Experimental Evaluation

The efficacy of the SAL method was evaluated on several datasets: MOSI, YouTube, and MOUD, each comprising multimedia data of opinions with sentiment annotations. Importantly, the MOSI dataset served as the primary training dataset, while the others were used for across-dataset generalization testing.

Results highlight the significant improvements imparted by SAL in generalization capabilities:

SAL-enhanced models consistently outperformed traditional CNN models across multiple modalities, achieving a performance increase in prediction accuracy for the MOSI test set and significant improvements for the YouTube and MOUD datasets.
Within-dataset experiments on the MOSI dataset also indicated that SAL could deliver higher accuracy rates across all modality combinations, including unimodal, bimodal, and multimodal fusion approaches.

Implications and Future Work

The implementation of Select-Additive Learning offers substantial contributions to the field of multimodal sentiment analysis by addressing the unique challenge of confounding factors with a straightforward, yet effective architectural tweak. This advancement has potential applications in improving the reliability of sentiment analysis systems deployed in varied cross-modal environments such as social media analytics, customer feedback evaluation, and emotion recognition systems.

Furthermore, this paper lays the groundwork for future exploration into more robust learning architectures that could take SAL a step forward, possibly exploring adaptive noise models or real-time confounding factor identification. Future research could also investigate the applicability of SAL in other areas where confounding factors pose significant issues, such as biased data training in emotion detection and other affective computing domains. The insights gained from SAL might also propel advancements in other neural network enhancements for better handling real-world data complexities.

Related Papers

GitHub

GitHub - HaohanWang/SelectAdditiveLearning: implementation for the paper "Select-Additive Learning: Improving Cross-individual Generalization in Multimodal Sentiment Analysis" (22 stars)