Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis (2405.13059v1)

Published 20 May 2024 in cs.CL and cs.AI

Abstract: As an important multimodal sentiment analysis task, Joint Multimodal Aspect-Sentiment Analysis (JMASA), aiming to jointly extract aspect terms and their associated sentiment polarities from the given text-image pairs, has gained increasing concerns. Existing works encounter two limitations: (1) multi-level modality noise, i.e., instance- and feature-level noise; and (2) multi-grained semantic gap, i.e., coarse- and fine-grained gap. Both issues may interfere with accurate identification of aspect-sentiment pairs. To address these limitations, we propose a novel framework named RNG for JMASA. Specifically, to simultaneously reduce multi-level modality noise and multi-grained semantic gap, we design three constraints: (1) Global Relevance Constraint (GR-Con) based on text-image similarity for instance-level noise reduction, (2) Information Bottleneck Constraint (IB-Con) based on the Information Bottleneck (IB) principle for feature-level noise reduction, and (3) Semantic Consistency Constraint (SC-Con) based on mutual information maximization in a contrastive learning way for multi-grained semantic gap reduction. Extensive experiments on two datasets validate our new state-of-the-art performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. X. Ju, D. Zhang, R. Xiao, J. Li, S. Li, M. Zhang, and G. Zhou, “Joint multi-modal aspect-sentiment analysis with auxiliary cross-modal relation detection,” in EMNLP, 2021, pp. 4395–4405.
  2. J. Yu, J. Wang, R. Xia, and J. Li, “Targeted multimodal sentiment classification based on coarse-to-fine grained image-target matching,” in IJCAI, 2022, pp. 4482–4488.
  3. Z. Wu, C. Zheng, Y. Cai, J. Chen, H.-f. Leung, and Q. Li, “Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts,” in ACM’MM, 2020, pp. 1038–1046.
  4. Y. Ling, J. Yu, and R. Xia, “Vision-language pre-training for multimodal aspect-based sentiment analysis,” in ACL, 2022, pp. 2149–2159.
  5. R. Zhou, W. Guo, X. Liu, S. Yu, Y. Zhang, and X. Yuan, “AoM: Detecting aspect-oriented information for multimodal aspect-based sentiment analysis,” in Findings of ACL, 2023, pp. 8184–8196.
  6. J. Yang, J. Duan, S. Tran, Y. Xu, S. Chanda, L. Chen, B. Zeng, T. Chilimbi, and J. Huang, “Vision-language pre-training with triple contrastive learning,” in CVPR, 2022, pp. 15 671–15 680.
  7. Y. Zhou, L. Huang, T. Guo, J. Han, and S. Hu, “A span-based joint model for opinion target extraction and target sentiment classification.” in IJCAI, 2019, pp. 5485–5491.
  8. Y. Liu, Y. Zhou, Z. Li, D. Wei, W. Zhou, and S. Hu, “Mrce: A multi-representation collaborative enhancement model for aspect-opinion pair extraction,” in ICONIP, 2022, pp. 260–271.
  9. Y. Liu, Y. Zhou, Z. Li, J. Wang, W. Zhou, and S. Hu, “Him: An end-to-end hierarchical interaction model for aspect sentiment triplet extraction,” TASLP, pp. 2272–2285, 2023.
  10. Z. Li, Y. Zhou, W. Zhang, Y. Liu, C. Yang, Z. Lian, and S. Hu, “Amoa: Global acoustic feature enhanced modal-order-aware network for multimodal sentiment analysis,” in COLING, 2022, pp. 7136–7146.
  11. Z. Li, Y. Zhou, Y. Liu, F. Zhu, C. Yang, and S. Hu, “Qap: A quantum-inspired adaptive-priority-learning model for multimodal emotion recognition,” in Findings of ACL, 2023, pp. 12 191–12 204.
  12. J. Yu, J. Jiang, L. Yang, and R. Xia, “Improving multimodal named entity recognition via entity span detection with unified multimodal transformer,” in ACL, 2020, pp. 3342–3352.
  13. L. Yang, J.-C. Na, and J. Yu, “Cross-modal multitask transformer for end-to-end multimodal aspect-based sentiment analysis,” IPM, vol. 59, no. 5, p. 103038, 2022.
  14. Z. Yu, J. Wang, L.-C. Yu, and X. Zhang, “Dual-encoder transformers with cross-modal alignment for multimodal aspect-based sentiment analysis,” in AACL, 2022, pp. 414–423.
  15. N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in ITW, 2015.
  16. A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.
  17. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
  18. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in ICLR, 2020.
  19. W. Hua, Z. Dai, H. Liu, and Q. Le, “Transformer quality in linear time,” in ICML, 2022, pp. 9099–9117.
  20. A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in ICLR, 2016.
  21. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in ICLR, 2014.
  22. J. D. Lafferty, A. McCallum, and F. C. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in ICML, 2001, pp. 282–289.
  23. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in ICML, 2021, pp. 8748–8763.
  24. L. Yao, R. Huang, L. Hou, G. Lu, M. Niu, H. Xu, X. Liang, Z. Li, X. Jiang, and C. Xu, “Filip: Fine-grained interactive language-image pre-training,” in ICLR, 2021.
  25. G. Chen, Y. Tian, and Y. Song, “Joint aspect extraction and sentiment analysis with directional graph convolutional networks,” in COLING, 2020, pp. 272–279.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets