New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis (2405.00543v1)
Abstract: The emergence of multimodal data on social media platforms presents new opportunities to better understand user sentiments toward a given aspect. However, existing multimodal datasets for Aspect-Category Sentiment Analysis (ACSA) often focus on textual annotations, neglecting fine-grained information in images. Consequently, these datasets fail to fully exploit the richness inherent in multimodal. To address this, we introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4,876 text-image pairs with 14,618 fine-grained annotations for both text and image in the hotel domain. Additionally, we propose a Fine-Grained Cross-Modal Fusion Framework (FCMF) that effectively learns both intra- and inter-modality interactions and then fuses these information to produce a unified multimodal representation. Experimental results show that our framework outperforms SOTA models on the ViMACSA dataset, achieving the highest F1 score of 79.73%. We also explore characteristics and challenges in Vietnamese multimodal sentiment analysis, including misspellings, abbreviations, and the complexities of the Vietnamese language. This work contributes both a benchmark dataset and a new framework that leverages fine-grained multimodal information to improve multimodal aspect-category sentiment analysis. Our dataset is available for research purposes: https://github.com/hoangquy18/Multimodal-Aspect-Category-Sentiment-Analysis.
- R. Das, T. D. Singh, Multimodal sentiment analysis: a survey of methods, trends, and challenges, ACM Computing Surveys 55 (2023) 1–38.
- A survey on aspect-based sentiment analysis: Tasks, methods, and challenges, IEEE Transactions on Knowledge and Data Engineering (2022).
- R. Lin, H. Hu, Adapt and explore: Multimodal mixup for representation learning, Information Fusion 105 (2024) 102216.
- K. Bayoudh, A survey of multimodal hybrid deep learning for computer vision: Architectures, applications, trends, and challenges, Information Fusion (2023) 102217.
- Multimodal graph learning based on 3d haar semi-tight framelet for student engagement prediction, Information Fusion 105 (2024) 102224.
- Transformer-based multimodal feature enhancement networks for multimodal depression detection integrating video, audio and remote photoplethysmograph signals, Information Fusion 104 (2024) 102161.
- Multimodal emotion recognition with deep learning: advancements, challenges, and future directions, Information Fusion 105 (2024) 102218.
- Qmfnd: A quantum multimodal fusion-based fake news detection model for social media, Information Fusion 104 (2024) 102172.
- A survey on multimodal aspect-based sentiment analysis, IEEE Access (2024).
- Iemocap: Interactive emotional dyadic motion capture database, Language resources and evaluation 42 (2008) 335–359.
- Deap: A database for emotion analysis; using physiological signals, IEEE transactions on affective computing 3 (2011) 18–31.
- Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos, arXiv preprint arXiv:1606.06259 (2016).
- Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2236–2246.
- Meld: A multimodal multi-party dataset for emotion recognition in conversations, arXiv preprint arXiv:1810.02508 (2018).
- Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality, in: Proceedings of the 58th annual meeting of the association for computational linguistics, pp. 3718–3727.
- Cmu-moseas: A multimodal language dataset for spanish, portuguese, german and french, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, volume 2020, NIH Public Access, p. 1801.
- Multi-modal sarcasm detection in twitter with hierarchical fusion model, in: Proceedings of the 57th annual meeting of the association for computational linguistics, pp. 2506–2515.
- Masad: A large-scale dataset for multimodal aspect-based sentiment analysis, Neurocomputing 455 (2021) 47–58.
- Memotion 2: Dataset on sentiment and emotion analysis of memes, in: Proceedings of De-Factify: Workshop on Multimodal Fact Checking and Hate Speech Detection, CEUR.
- Multi-interactive memory network for aspect based multimodal sentiment analysis, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 371–378.
- J. Yu, J. Jiang, Adapting bert for target-oriented multimodal sentiment classification, IJCAI.
- Sentiment analysis on multi-view social data, in: MultiMedia Modeling, p. 15–27.
- Cross-media learning for image sentiment analysis in the wild, in: Proceedings of the IEEE international conference on computer vision workshops, pp. 308–317.
- Aspect-based sentiment analysis using bert, in: Proceedings of the 22nd nordic conference on computational linguistics, pp. 187–196.
- Aspect aware learning for aspect category sentiment analysis, ACM Transactions on Knowledge Discovery from Data (TKDD) 13 (2019) 1–21.
- Solving aspect category sentiment analysis as a text generation task, arXiv preprint arXiv:2110.07310 (2021).
- Asap: A chinese review dataset towards aspect category sentiment analysis and rating prediction, arXiv preprint arXiv:2103.06605 (2021).
- An improved aspect-category sentiment analysis model for text sentiment analysis based on roberta, Applied Intelligence 51 (2021) 3522–3533.
- Sarcasm driven by sentiment: A sentiment-aware hierarchical fusion network for multimodal sarcasm detection, Information Fusion 108 (2024) 102353.
- Atlantis: Aesthetic-oriented multiple granularities fusion network for joint multimodal aspect-based sentiment analysis, Information Fusion (2024) 102304.
- Skeafn: Sentiment knowledge enhanced attention fusion network for multimodal sentiment analysis, Information Fusion 100 (2023) 101958.
- Fact-sentiment incongruity combination network for multimodal sarcasm detection, Information Fusion 104 (2024) 102203.
- Sentiment analysis of social images via hierarchical deep fusion of content and links, Applied Soft Computing 80 (2019) 387–399.
- Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification, IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2019) 429–439.
- Social image sentiment analysis by exploiting multimodal content and heterogeneous relations, IEEE Transactions on Industrial Informatics 17 (2020) 2974–2982.
- Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
- Z. Khan, Y. Fu, Exploiting bert for multimodal target sentiment classification through input space translation, in: Proceedings of the 29th ACM international conference on multimedia, pp. 3034–3042.
- Targeted multimodal sentiment classification based on coarse-to-fine grained image-target matching., in: IJCAI, pp. 4482–4488.
- J. Zhao, F. Yang, Fusion with gcn and se-resnext network for aspect based multimodal sentiment analysis, in: 2023 IEEE 6th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), volume 6, IEEE, pp. 336–340.
- Macsa: A multimodal aspect-category sentiment analysis dataset with multimodal fine-grained aligned annotations, Multimedia Tools and Applications (2024) 1–19.
- Vivqa: Vietnamese visual question answering, in: Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation, pp. 683–691.
- Viclevr: A visual reasoning dataset and hybrid multimodal fusion model for visual question answering in vietnamese, arXiv preprint arXiv:2310.18046 (2023).
- Openvivqa: Task, dataset, and multimodal fusion models for visual question answering in vietnamese, Information Fusion 100 (2023a) 101868.
- Evjvqa challenge: Multilingual visual question answering, Journal of Computer Science and Cybernetics (2023b) 237–258.
- L. Mai, B. Le, Aspect-based sentiment analysis of vietnamese texts with deep learning, in: Intelligent Information and Database Systems: 10th Asian Conference, ACIIDS 2018, Dong Hoi City, Vietnam, March 19-21, 2018, Proceedings, Part I 10, Springer, pp. 149–158.
- Emotion recognition for vietnamese social media text, in: Computational Linguistics: 16th International Conference of the Pacific Association for Computational Linguistics, PACLING 2019, Hanoi, Vietnam, October 11–13, 2019, Revised Selected Papers 16, Springer, pp. 319–333.
- Uit-vsfc: Vietnamese students’ feedback corpus for sentiment analysis, in: 2018 10th international conference on knowledge and systems engineering (KSE), IEEE, pp. 19–24.
- Sa2sl: From aspect-based sentiment analysis to social listening system for business intelligence, in: Knowledge Science, Engineering and Management: 14th International Conference, KSEM 2021, Tokyo, Japan, August 14–16, 2021, Proceedings, Part II 14, Springer, pp. 647–658.
- Span detection for aspect-based sentiment analysis in Vietnamese, in: K. Hu, J.-B. Kim, C. Zong, E. Chersoni (Eds.), Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation, Association for Computational Lingustics, Shanghai, China, 2021, pp. 318–328.
- A joint multi-task architecture for document-level aspect-based sentiment analysis in vietnamese, IJMLC 12 (2022).
- Semeval-2016 task 5: Aspect based sentiment analysis, in: ProWorkshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics, pp. 19–30.
- Vlsp shared task: sentiment analysis, Journal of Computer Science and Cybernetics 34 (2018) 295–310.
- J. Cohen, A coefficient of agreement for nominal scales, Educational and psychological measurement 20 (1960) 37–46.
- Generalized intersection over union: A metric and a loss for bounding box regression, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 658–666.
- M. L. McHugh, Interrater reliability: the kappa statistic, Biochemia medica 22 (2012) 276–282.
- Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
- Unsupervised cross-lingual representation learning at scale, arXiv preprint arXiv:1911.02116 (2019).
- Utilizing bert for aspect-based sentiment analysis via constructing auxiliary sentence, arXiv preprint arXiv:1903.09588 (2019).
- Multimodal transformer for unaligned multimodal language sequences, in: Proceedings of the conference. Association for computational linguistics. Meeting, volume 2019, NIH Public Access, p. 6558.
- Relation networks for object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3588–3597.
- Image captioning: Transforming objects into words, Advances in neural information processing systems 32 (2019).
- Memnet: A persistent memory network for image restoration, in: Proceedings of the IEEE international conference on computer vision, pp. 4539–4547.
- W. Xue, T. Li, Aspect based sentiment analysis with gated convolutional networks, arXiv preprint arXiv:1805.07043 (2018).
- Interactive attention networks for aspect-level sentiment classification, arXiv preprint arXiv:1709.00893 (2017).
- Lcf: A local context focus mechanism for aspect-based sentiment classification, Applied Sciences 9 (2019) 3389.
- M. Sokolova, G. Lapalme, A systematic analysis of performance measures for classification tasks, Information processing & management 45 (2009) 427–437.
- Quy Hoang Nguyen (1 paper)
- Minh-Van Truong Nguyen (1 paper)
- Kiet Van Nguyen (74 papers)