Papers
Topics
Authors
Recent
2000 character limit reached

Solution for Emotion Prediction Competition of Workshop on Emotionally and Culturally Intelligent AI (2403.17683v2)

Published 26 Mar 2024 in cs.AI

Abstract: This report provide a detailed description of the method that we explored and proposed in the WECIA Emotion Prediction Competition (EPC), which predicts a person's emotion through an artistic work with a comment. The dataset of this competition is ArtELingo, designed to encourage work on diversity across languages and cultures. The dataset has two main challenges, namely modal imbalance problem and language-cultural differences problem. In order to address this issue, we propose a simple yet effective approach called single-multi modal with Emotion-Cultural specific prompt(ECSP), which focuses on using the single modal message to enhance the performance of multimodal models and a well-designed prompt to reduce cultural differences problem. To clarify, our approach contains two main blocks: (1)XLM-R\cite{conneau2019unsupervised} based unimodal model and X$2$-VLM\cite{zeng2022x} based multimodal model (2) Emotion-Cultural specific prompt. Our approach ranked first in the final test with a score of 0.627.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116, 2019.
  2. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  3. Describing emotions with acoustic property prompts for speech emotion recognition. arXiv preprint arXiv:2211.07737, 2022.
  4. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR, 2020.
  5. Artelingo: A million emotion annotations of wikiart with emphasis on diversity over language and culture. arXiv preprint arXiv:2211.10780, 2022.
  6. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  7. Deep visual-linguistic fusion network considering cross-modal inconsistency for rumor detection. Science China Information Sciences, 66(12):222102, 2023.
  8. Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning. Frontiers of Computer Science, 18(1):181335, 2024.
  9. Auxiliary information regularized machine for multiple modality feature learning. In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
  10. Learning by actively querying strong modal features. In IJCAI, pages 2280–2286, 2016.
  11. X22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-vlm: All-in-one pre-trained model for vision-language tasks. arXiv preprint arXiv:2211.12402, 2022.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.