Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach (2401.10747v3)
Abstract: Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario. In this paper, we propose a novel knowledge-transfer network to translate between different modalities to reconstruct the missing audio modalities. Moreover, we develop a cross-modality attention mechanism to retain the maximal information of the reconstructed and observed modalities for sentiment prediction. Extensive experiments on three publicly available datasets demonstrate significant improvements over baselines and achieve comparable results to the previous methods with complete multi-modality supervision.
- “Multimodal transformer for unaligned multimodal language sequences,” in Proceedings of the conference. Association for Computational Linguistics. Meeting. NIH Public Access, 2019, vol. 2019, p. 6558.
- “Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages,” IEEE Intelligent Systems, vol. 31, no. 6, pp. 82–88, 2016.
- “Multimodal spontaneous emotion corpus for human behavior analysis,” in CVPP, 2016, pp. 3438–3446.
- “Audio-visual affect recognition through multi-stream fused HMM for HCI,” in CVPR, 2005, pp. 967–972.
- “A multimodal deep regression bayesian network for affective video content analyses,” in ICCV, 2017, pp. 5123–5132.
- “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- “The best of both worlds: Combining recent advances in neural machine translation,” arXiv preprint arXiv:1804.09849, 2018.
- J Devlin M Chang K Lee and K Toutanova, “Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- “Multimodal language analysis with recurrent multistage fusion,” arXiv preprint arXiv:1808.03920, 2018.
- “Words can shift: Dynamically adjusting word representations using nonverbal behaviors,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 7216–7223.
- “Found in translation: Learning robust joint representations by cyclic translations between modalities,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 6892–6899.
- “Learning factorized multimodal representations,” arXiv preprint arXiv:1806.06176, 2018.
- “Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 2236–2246.
- “Memory fusion network for multi-view sequential learning,” in Proceedings of the AAAI conference on artificial intelligence, 2018, vol. 32.
- “Iemocap: Interactive emotional dyadic motion capture database,” Language resources and evaluation, vol. 42, no. 4, pp. 335–359, 2008.
- “Context-dependent sentiment analysis in user-generated videos,” in Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers), 2017, pp. 873–883.
- “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.
- Weide Liu (23 papers)
- Huijing Zhan (5 papers)
- Hao Chen (1005 papers)
- Fengmao Lv (22 papers)