Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning (2305.08473v2)

Published 15 May 2023 in cs.CL and cs.CV

Abstract: Designing an effective representation learning method for multimodal sentiment analysis tasks is a crucial research direction. The challenge lies in learning both shared and private information in a complete modal representation, which is difficult with uniform multimodal labels and a raw feature fusion approach. In this work, we propose a deep modal shared information learning module based on the covariance matrix to capture the shared information between modalities. Additionally, we use a label generation module based on a self-supervised learning strategy to capture the private information of the modalities. Our module is plug-and-play in multimodal tasks, and by changing the parameterization, it can adjust the information exchange relationship between the modes and learn the private or shared information between the specified modes. We also employ a multi-task learning strategy to help the model focus its attention on the modal differentiation training data. We provide a detailed formulation derivation and feasibility proof for the design of the deep modal shared information learning module. We conduct extensive experiments on three common multimodal sentiment analysis baseline datasets, and the experimental results validate the reliability of our model. Furthermore, we explore more combinatorial techniques for the use of the module. Our approach outperforms current state-of-the-art methods on most of the metrics of the three public datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. S. Lai, L. Hu, J. Wang, L. Berti-Equille, and D. Wang, “Faithful vision-language interpretation via concept bottleneck models,” in The Twelfth International Conference on Learning Representations, 2023.
  2. R. Guan, Z. Li, W. Tu, J. Wang, Y. Liu, X. Li, C. Tang, and R. Feng, “Contrastive multi-view subspace clustering of hyperspectral images based on graph convolutional networks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024.
  3. R. Guan, Z. Li, X. Li, and C. Tang, “Pixel-superpixel contrastive learning and pseudo-label correction for hyperspectral image clustering,” arXiv preprint arXiv:2312.09630, 2023.
  4. R. Guan, Z. Li, T. Li, X. Li, J. Yang, and W. Chen, “Classification of heterogeneous mining areas based on rescapsnet and gaofen-5 imagery,” Remote Sensing, vol. 14, no. 13, p. 3216, 2022.
  5. J. Liu, R. Guan, Z. Li, J. Zhang, Y. Hu, and X. Wang, “Adaptive multi-feature fusion graph convolutional network for hyperspectral image classification,” Remote Sensing, vol. 15, no. 23, p. 5483, 2023.
  6. S. Poria, D. Hazarika, N. Majumder, and R. Mihalcea, “Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research,” IEEE Transactions on Affective Computing, 2020.
  7. Z. Sun, P. Sarma, W. Sethares, and Y. Liang, “Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, 2020, pp. 8992–8999.
  8. S. Lai, X. Hu, H. Xu, Z. Ren, and Z. Liu, “Multimodal sentiment analysis: A survey,” Displays, p. 102563, 2023.
  9. G. Vinodhini and R. Chandrasekaran, “Sentiment analysis and opinion mining: a survey,” International Journal, vol. 2, no. 6, pp. 282–292, 2012.
  10. J. Arevalo, T. Solorio, M. Montes-y Gomez, and F. A. González, “Gated multimodal networks,” Neural Computing and Applications, vol. 32, pp. 10 209–10 228, 2020.
  11. J. O. Egede, S. Song, T. A. Olugbade, C. Wang, C. D. C. Amanda, H. Meng, M. Aung, N. D. Lane, M. Valstar, and N. Bianchi-Berthouze, “Emopain challenge 2020: Multimodal pain evaluation from facial and bodily expressions,” in 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).   IEEE, 2020, pp. 849–856.
  12. A. I. Middya, B. Nag, and S. Roy, “Deep learning based multimodal emotion recognition using model-level fusion of audio–visual modalities,” Knowledge-Based Systems, vol. 244, p. 108580, 2022.
  13. Y. Zhang, C. Cheng, and Y. Zhang, “Multimodal emotion recognition based on manifold learning and convolution neural network,” Multimedia Tools and Applications, vol. 81, no. 23, pp. 33 253–33 268, 2022.
  14. H. Zhan, K. Zhang, C. Hu, and V. Sheng, “Multi-objective privacy-preserving text representation learning,” in Proceedings of the 30th acm international conference on information & knowledge management, 2021, pp. 3612–3616.
  15. H. Zhan, L. Gao, K. Zhang, Z. Chen, and V. S. Sheng, “Defending the graph reconstruction attacks for simplicial neural networks,” in 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA).   IEEE, 2023, pp. 1–9.
  16. A. Gandhi, K. Adhvaryu, S. Poria, E. Cambria, and A. Hussain, “Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions,” Information Fusion, 2022.
  17. Z. Xiaoming, Y. Yijiao, and Z. Shiqing, “Survey of deep learning based multimodal emotion recognition,” Journal of Frontiers of Computer Science & Technology, vol. 16, no. 7, p. 1479, 2022.
  18. A. Rahate, R. Walambe, S. Ramanna, and K. Kotecha, “Multimodal co-learning: challenges, applications with datasets, recent advances and future directions,” Information Fusion, vol. 81, pp. 203–239, 2022.
  19. H. Zhan, K. Zhang, Z. Chen, and V. S. Sheng, “Simplex2vec backward: From vectors back to simplicial complex,” in Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023, pp. 4405–4409.
  20. H. Zhan, K. Zhang, K. Lu, and V. S. Sheng, “Measuring the privacy leakage via graph reconstruction attacks on simplicial neural networks (student abstract),” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 13, 2023, pp. 16 380–16 381.
  21. Y.-H. H. Tsai, S. Bai, P. P. Liang, J. Z. Kolter, L.-P. Morency, and R. Salakhutdinov, “Multimodal transformer for unaligned multimodal language sequences,” in Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019.   NIH Public Access, 2019, p. 6558.
  22. W. Yu, H. Xu, Z. Yuan, and J. Wu, “Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis,” in Proceedings of the AAAI conference on artificial intelligence, vol. 35, no. 12, 2021, pp. 10 790–10 797.
  23. Z. Chen and Y. Ge, “Occluded cloth-changing person re-identification,” arXiv preprint arXiv:2403.08557, 2024.
  24. Y. Ge, K. Niu, Z. Chen, and Q. Zhang, “Lightweight traffic sign recognition model based on dynamic feature extraction,” in International Conference on Applied Intelligence.   Springer, 2023, pp. 339–350.
  25. Y. Ge, J. Zhang, Z. Chen, and B. Li, “End-to-end person search based on content awareness,” in 2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML).   IEEE, 2023, pp. 1108–1111.
  26. J. Zhang, Z. Chen, Y. Ge, and M. Yu, “An efficient convolutional multi-scale vision transformer for image classification,” in 2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML).   IEEE, 2023, pp. 344–347.
  27. Z. Chen, Y. Ge, J. Zhang, and X. Gao, “Multi-branch person re-identification net,” in 2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML).   IEEE, 2023, pp. 1104–1107.
  28. S. Lai, X. Hu, J. Han, C. Wang, S. Mukhopadhyay, Z. Liu, and L. Ye, “Predicting lysine phosphoglycerylation sites using bidirectional encoder representations with transformers & protein feature extraction and selection,” in 2022 15th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).   IEEE, 2022, pp. 1–6.
  29. K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  30. J. Wang, C. Lan, C. Liu, Y. Ouyang, T. Qin, W. Lu, Y. Chen, W. Zeng, and P. Yu, “Generalizing to unseen domains: A survey on domain generalization,” IEEE Transactions on Knowledge and Data Engineering, 2022.
  31. D. Kim, K. Wang, S. Sclaroff, and K. Saenko, “A broad study of pre-training for domain generalization and adaptation,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII.   Springer, 2022, pp. 621–638.
  32. C. Zhao and W. Shen, “A domain generalization network combing invariance and specificity towards real-time intelligent fault diagnosis,” Mechanical Systems and Signal Processing, vol. 173, p. 108990, 2022.
  33. H. Xu, S. Lai, X. Li, and Y. Yang, “Cross-domain car detection model with integrated convolutional block attention mechanism,” Image and Vision Computing, vol. 140, p. 104834, 2023.
  34. B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain adaptation,” in Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14.   Springer, 2016, pp. 443–450.
  35. D. Hazarika, R. Zimmermann, and S. Poria, “Misa: Modality-invariant and-specific representations for multimodal sentiment analysis,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1122–1131.
  36. Y. Wu, Z. Lin, Y. Zhao, B. Qin, and L.-N. Zhu, “A text-centered shared-private framework via cross-modal prediction for multimodal sentiment analysis,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 4730–4738.
  37. S. Poria, E. Cambria, and A. Gelbukh, “Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis,” in Proceedings of the 2015 conference on empirical methods in natural language processing, 2015, pp. 2539–2544.
  38. F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: the munich versatile and fast open-source audio feature extractor,” in Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 1459–1462.
  39. W. Wu, Y. Wang, S. Xu, and K. Yan, “Sfnn: Semantic features fusion neural network for multimodal sentiment analysis,” in 2020 5th International Conference on Automation, Control and Robotics Engineering (CACRE).   IEEE, 2020, pp. 661–665.
  40. C. Edwards, “The best of nlp,” Communications of the ACM, vol. 64, no. 4, pp. 9–11, 2021.
  41. M. Koroteev, “Bert: a review of applications in natural language processing and understanding,” arXiv preprint arXiv:2103.11943, 2021.
  42. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  43. M. Müller, M. Salathé, and P. E. Kummervold, “Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter,” arXiv preprint arXiv:2005.07503, 2020.
  44. Y. Yu, X. Si, C. Hu, and J. Zhang, “A review of recurrent neural networks: Lstm cells and network architectures,” Neural computation, vol. 31, no. 7, pp. 1235–1270, 2019.
  45. A. Sherstinsky, “Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network,” Physica D: Nonlinear Phenomena, vol. 404, p. 132306, 2020.
  46. F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with lstm,” Neural computation, vol. 12, no. 10, pp. 2451–2471, 2000.
  47. Z. Zhao, W. Chen, X. Wu, P. C. Chen, and J. Liu, “Lstm network: a deep learning approach for short-term traffic forecast,” IET Intelligent Transport Systems, vol. 11, no. 2, pp. 68–75, 2017.
  48. K. Park, Y. Choi, W. J. Choi, H.-Y. Ryu, and H. Kim, “Lstm-based battery remaining useful life prediction with multi-channel charging profiles,” Ieee Access, vol. 8, pp. 20 786–20 798, 2020.
  49. Z. Shen, J. Liu, Y. He, X. Zhang, R. Xu, H. Yu, and P. Cui, “Towards out-of-distribution generalization: A survey,” arXiv preprint arXiv:2108.13624, 2021.
  50. S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,” IEEE transactions on neural networks, vol. 22, no. 2, pp. 199–210, 2010.
  51. Y. Xu, J. Liu, Z. Wan, D. Zhang, and D. Jiang, “Rotor fault diagnosis using domain-adversarial neural network with time-frequency analysis,” Machines, vol. 10, no. 8, p. 610, 2022.
  52. Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5586–5609, 2021.
  53. ——, “An overview of multi-task learning,” National Science Review, vol. 5, no. 1, pp. 30–43, 2018.
  54. O. Sener and V. Koltun, “Multi-task learning as multi-objective optimization,” Advances in neural information processing systems, vol. 31, 2018.
  55. B. Yang, L. Wu, J. Zhu, B. Shao, X. Lin, and T.-Y. Liu, “Multimodal sentiment analysis with two-phase multi-task learning,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2015–2024, 2022.
  56. Y. Zhang, L. Rong, X. Li, and R. Chen, “Multi-modal sentiment and emotion joint analysis with a deep attentive multi-task learning model,” in Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part I.   Springer, 2022, pp. 518–532.
  57. Y. Song, X. Fan, Y. Yang, G. Ren, and W. Pan, “A cross-modal attention and multi-task learning based approach for multi-modal sentiment analysis,” in Artificial Intelligence in China: Proceedings of the 3rd International Conference on Artificial Intelligence in China.   Springer, 2022, pp. 159–166.
  58. D. S. Chauhan, S. Dhanush, A. Ekbal, and P. Bhattacharyya, “Sentiment and emotion help sarcasm? a multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 4351–4360.
  59. A. Zadeh, R. Zellers, E. Pincus, and L.-P. Morency, “Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos,” arXiv preprint arXiv:1606.06259, 2016.
  60. A. B. Zadeh, P. P. Liang, S. Poria, E. Cambria, and L.-P. Morency, “Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 2236–2246.
  61. W. Yu, H. Xu, F. Meng, Y. Zhu, Y. Ma, J. Wu, J. Zou, and K. Yang, “Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality,” in Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 3718–3727.
  62. A. Zadeh, M. Chen, S. Poria, E. Cambria, and L.-P. Morency, “Tensor fusion network for multimodal sentiment analysis,” arXiv preprint arXiv:1707.07250, 2017.
  63. Z. Liu, Y. Shen, V. B. Lakshminarasimhan, P. P. Liang, A. Zadeh, and L.-P. Morency, “Efficient low-rank multimodal fusion with modality-specific factors,” arXiv preprint arXiv:1806.00064, 2018.
  64. Y.-H. H. Tsai, P. P. Liang, A. Zadeh, L.-P. Morency, and R. Salakhutdinov, “Learning factorized multimodal representations,” arXiv preprint arXiv:1806.06176, 2018.
  65. Y. Wang, Y. Shen, Z. Liu, P. P. Liang, A. Zadeh, and L.-P. Morency, “Words can shift: Dynamically adjusting word representations using nonverbal behaviors,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 7216–7223.
  66. W. Rahman, M. K. Hasan, S. Lee, A. Zadeh, C. Mao, L.-P. Morency, and E. Hoque, “Integrating multimodal information in large pretrained transformers,” in Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2020.   NIH Public Access, 2020, p. 2359.
  67. Q. Zhang, L. Shi, P. Liu, Z. Zhu, and L. Xu, “Icdn: integrating consistency and difference networks by transformer for multimodal sentiment analysis,” Applied Intelligence, vol. 53, no. 12, pp. 16 332–16 345, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Songning Lai (31 papers)
  2. Jiakang Li (9 papers)
  3. Guinan Guo (4 papers)
  4. Xifeng Hu (2 papers)
  5. Yulong Li (37 papers)
  6. Yuan Tan (9 papers)
  7. Zichen Song (11 papers)
  8. Yutong Liu (21 papers)
  9. Zhaoxia Ren (2 papers)
  10. Chun Wan (1 paper)
  11. Danmin Miao (2 papers)
  12. Zhi Liu (155 papers)
Citations (9)