Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis (2405.00543v1)

Published 1 May 2024 in cs.CL and cs.AI

Abstract: The emergence of multimodal data on social media platforms presents new opportunities to better understand user sentiments toward a given aspect. However, existing multimodal datasets for Aspect-Category Sentiment Analysis (ACSA) often focus on textual annotations, neglecting fine-grained information in images. Consequently, these datasets fail to fully exploit the richness inherent in multimodal. To address this, we introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4,876 text-image pairs with 14,618 fine-grained annotations for both text and image in the hotel domain. Additionally, we propose a Fine-Grained Cross-Modal Fusion Framework (FCMF) that effectively learns both intra- and inter-modality interactions and then fuses these information to produce a unified multimodal representation. Experimental results show that our framework outperforms SOTA models on the ViMACSA dataset, achieving the highest F1 score of 79.73%. We also explore characteristics and challenges in Vietnamese multimodal sentiment analysis, including misspellings, abbreviations, and the complexities of the Vietnamese language. This work contributes both a benchmark dataset and a new framework that leverages fine-grained multimodal information to improve multimodal aspect-category sentiment analysis. Our dataset is available for research purposes: https://github.com/hoangquy18/Multimodal-Aspect-Category-Sentiment-Analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. R. Das, T. D. Singh, Multimodal sentiment analysis: a survey of methods, trends, and challenges, ACM Computing Surveys 55 (2023) 1–38.
  2. A survey on aspect-based sentiment analysis: Tasks, methods, and challenges, IEEE Transactions on Knowledge and Data Engineering (2022).
  3. R. Lin, H. Hu, Adapt and explore: Multimodal mixup for representation learning, Information Fusion 105 (2024) 102216.
  4. K. Bayoudh, A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges, Information Fusion (2023) 102217.
  5. Multimodal graph learning based on 3d haar semi-tight framelet for student engagement prediction, Information Fusion 105 (2024) 102224.
  6. Transformer-based multimodal feature enhancement networks for multimodal depression detection integrating video, audio and remote photoplethysmograph signals, Information Fusion 104 (2024) 102161.
  7. Multimodal emotion recognition with deep learning: advancements, challenges, and future directions, Information Fusion 105 (2024) 102218.
  8. Qmfnd: A quantum multimodal fusion-based fake news detection model for social media, Information Fusion 104 (2024) 102172.
  9. A survey on multimodal aspect-based sentiment analysis, IEEE Access (2024).
  10. Iemocap: Interactive emotional dyadic motion capture database, Language resources and evaluation 42 (2008) 335–359.
  11. Deap: A database for emotion analysis; using physiological signals, IEEE transactions on affective computing 3 (2011) 18–31.
  12. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos, arXiv preprint arXiv:1606.06259 (2016).
  13. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2236–2246.
  14. Meld: A multimodal multi-party dataset for emotion recognition in conversations, arXiv preprint arXiv:1810.02508 (2018).
  15. Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality, in: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 3718–3727.
  16. Cmu-moseas: A multimodal language dataset for spanish, portuguese, german and french, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, volume 2020, NIH Public Access, p. 1801.
  17. Multi-modal sarcasm detection in twitter with hierarchical fusion model, in: Proceedings of the 57th annual meeting of the association for computational linguistics, pp. 2506–2515.
  18. Masad: A large-scale dataset for multimodal aspect-based sentiment analysis, Neurocomputing 455 (2021) 47–58.
  19. Memotion 2: Dataset on sentiment and emotion analysis of memes, in: Proceedings of De-Factify: Workshop on Multimodal Fact Checking and Hate Speech Detection, CEUR.
  20. Multi-interactive memory network for aspect based multimodal sentiment analysis, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 371–378.
  21. J. Yu, J. Jiang, Adapting bert for target-oriented multimodal sentiment classification, IJCAI.
  22. Sentiment analysis on multi-view social data, in: MultiMedia Modeling, p. 15–27.
  23. Cross-media learning for image sentiment analysis in the wild, in: Proceedings of the IEEE international conference on computer vision workshops, pp. 308–317.
  24. Aspect-based sentiment analysis using bert, in: Proceedings of the 22nd nordic conference on computational linguistics, pp. 187–196.
  25. Aspect aware learning for aspect category sentiment analysis, ACM Transactions on Knowledge Discovery from Data (TKDD) 13 (2019) 1–21.
  26. Solving aspect category sentiment analysis as a text generation task, arXiv preprint arXiv:2110.07310 (2021).
  27. Asap: A chinese review dataset towards aspect category sentiment analysis and rating prediction, arXiv preprint arXiv:2103.06605 (2021).
  28. An improved aspect-category sentiment analysis model for text sentiment analysis based on roberta, Applied Intelligence 51 (2021) 3522–3533.
  29. Sarcasm driven by sentiment: A sentiment-aware hierarchical fusion network for multimodal sarcasm detection, Information Fusion 108 (2024) 102353.
  30. Atlantis: Aesthetic-oriented multiple granularities fusion network for joint multimodal aspect-based sentiment analysis, Information Fusion (2024) 102304.
  31. Skeafn: Sentiment knowledge enhanced attention fusion network for multimodal sentiment analysis, Information Fusion 100 (2023) 101958.
  32. Fact-sentiment incongruity combination network for multimodal sarcasm detection, Information Fusion 104 (2024) 102203.
  33. Sentiment analysis of social images via hierarchical deep fusion of content and links, Applied Soft Computing 80 (2019) 387–399.
  34. Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification, IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2019) 429–439.
  35. Social image sentiment analysis by exploiting multimodal content and heterogeneous relations, IEEE Transactions on Industrial Informatics 17 (2020) 2974–2982.
  36. Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
  37. Z. Khan, Y. Fu, Exploiting bert for multimodal target sentiment classification through input space translation, in: Proceedings of the 29th ACM international conference on multimedia, pp. 3034–3042.
  38. Targeted multimodal sentiment classification based on coarse-to-fine grained image-target matching., in: IJCAI, pp. 4482–4488.
  39. J. Zhao, F. Yang, Fusion with gcn and se-resnext network for aspect based multimodal sentiment analysis, in: 2023 IEEE 6th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), volume 6, IEEE, pp. 336–340.
  40. Macsa: A multimodal aspect-category sentiment analysis dataset with multimodal fine-grained aligned annotations, Multimedia Tools and Applications (2024) 1–19.
  41. Vivqa: Vietnamese visual question answering, in: Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation, pp. 683–691.
  42. Viclevr: A visual reasoning dataset and hybrid multimodal fusion model for visual question answering in vietnamese, arXiv preprint arXiv:2310.18046 (2023).
  43. Openvivqa: Task, dataset, and multimodal fusion models for visual question answering in vietnamese, Information Fusion 100 (2023a) 101868.
  44. Evjvqa challenge: Multilingual visual question answering, Journal of Computer Science and Cybernetics (2023b) 237–258.
  45. L. Mai, B. Le, Aspect-based sentiment analysis of vietnamese texts with deep learning, in: Intelligent Information and Database Systems: 10th Asian Conference, ACIIDS 2018, Dong Hoi City, Vietnam, March 19-21, 2018, Proceedings, Part I 10, Springer, pp. 149–158.
  46. Emotion recognition for vietnamese social media text, in: Computational Linguistics: 16th International Conference of the Pacific Association for Computational Linguistics, PACLING 2019, Hanoi, Vietnam, October 11–13, 2019, Revised Selected Papers 16, Springer, pp. 319–333.
  47. Uit-vsfc: Vietnamese students’ feedback corpus for sentiment analysis, in: 2018 10th international conference on knowledge and systems engineering (KSE), IEEE, pp. 19–24.
  48. Sa2sl: From aspect-based sentiment analysis to social listening system for business intelligence, in: Knowledge Science, Engineering and Management: 14th International Conference, KSEM 2021, Tokyo, Japan, August 14–16, 2021, Proceedings, Part II 14, Springer, pp. 647–658.
  49. Span detection for aspect-based sentiment analysis in Vietnamese, in: K. Hu, J.-B. Kim, C. Zong, E. Chersoni (Eds.), Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation, Association for Computational Lingustics, Shanghai, China, 2021, pp. 318–328.
  50. A joint multi-task architecture for document-level aspect-based sentiment analysis in vietnamese, IJMLC 12 (2022).
  51. Semeval-2016 task 5: Aspect based sentiment analysis, in: ProWorkshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics, pp. 19–30.
  52. Vlsp shared task: sentiment analysis, Journal of Computer Science and Cybernetics 34 (2018) 295–310.
  53. J. Cohen, A coefficient of agreement for nominal scales, Educational and psychological measurement 20 (1960) 37–46.
  54. Generalized intersection over union: A metric and a loss for bounding box regression, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 658–666.
  55. M. L. McHugh, Interrater reliability: the kappa statistic, Biochemia medica 22 (2012) 276–282.
  56. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
  57. Unsupervised cross-lingual representation learning at scale, arXiv preprint arXiv:1911.02116 (2019).
  58. Utilizing bert for aspect-based sentiment analysis via constructing auxiliary sentence, arXiv preprint arXiv:1903.09588 (2019).
  59. Multimodal transformer for unaligned multimodal language sequences, in: Proceedings of the conference. Association for computational linguistics. Meeting, volume 2019, NIH Public Access, p. 6558.
  60. Relation networks for object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3588–3597.
  61. Image captioning: Transforming objects into words, Advances in neural information processing systems 32 (2019).
  62. Memnet: A persistent memory network for image restoration, in: Proceedings of the IEEE international conference on computer vision, pp. 4539–4547.
  63. W. Xue, T. Li, Aspect based sentiment analysis with gated convolutional networks, arXiv preprint arXiv:1805.07043 (2018).
  64. Interactive attention networks for aspect-level sentiment classification, arXiv preprint arXiv:1709.00893 (2017).
  65. Lcf: A local context focus mechanism for aspect-based sentiment classification, Applied Sciences 9 (2019) 3389.
  66. M. Sokolova, G. Lapalme, A systematic analysis of performance measures for classification tasks, Information processing & management 45 (2009) 427–437.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Quy Hoang Nguyen (1 paper)
  2. Minh-Van Truong Nguyen (1 paper)
  3. Kiet Van Nguyen (74 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets