Solution for Emotion Prediction Competition of Workshop on Emotionally and Culturally Intelligent AI (2403.17683v2)
Abstract: This report provide a detailed description of the method that we explored and proposed in the WECIA Emotion Prediction Competition (EPC), which predicts a person's emotion through an artistic work with a comment. The dataset of this competition is ArtELingo, designed to encourage work on diversity across languages and cultures. The dataset has two main challenges, namely modal imbalance problem and language-cultural differences problem. In order to address this issue, we propose a simple yet effective approach called single-multi modal with Emotion-Cultural specific prompt(ECSP), which focuses on using the single modal message to enhance the performance of multimodal models and a well-designed prompt to reduce cultural differences problem. To clarify, our approach contains two main blocks: (1)XLM-R\cite{conneau2019unsupervised} based unimodal model and X$2$-VLM\cite{zeng2022x} based multimodal model (2) Emotion-Cultural specific prompt. Our approach ranked first in the final test with a score of 0.627.
- Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116, 2019.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Describing emotions with acoustic property prompts for speech emotion recognition. arXiv preprint arXiv:2211.07737, 2022.
- Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR, 2020.
- Artelingo: A million emotion annotations of wikiart with emphasis on diversity over language and culture. arXiv preprint arXiv:2211.10780, 2022.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Deep visual-linguistic fusion network considering cross-modal inconsistency for rumor detection. Science China Information Sciences, 66(12):222102, 2023.
- Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning. Frontiers of Computer Science, 18(1):181335, 2024.
- Auxiliary information regularized machine for multiple modality feature learning. In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
- Learning by actively querying strong modal features. In IJCAI, pages 2280–2286, 2016.
- X22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-vlm: All-in-one pre-trained model for vision-language tasks. arXiv preprint arXiv:2211.12402, 2022.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.