Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MSEVA : A System for Multimodal Short Videos Emotion Visual Analysis (2312.04279v2)

Published 7 Dec 2023 in cs.SI

Abstract: YouTube Shorts, a new section launched by YouTube in 2021, is a direct competitor to short video platforms like TikTok. It reflects the rising demand for short video content among online users. Social media platforms are often flooded with short videos that capture different perspectives and emotions on hot events. These videos can go viral and have a significant impact on the public's mood and views. However, short videos' affective computing was a neglected area of research in the past. Monitoring the public's emotions through these videos requires a lot of time and effort, which may not be enough to prevent undesirable outcomes. In this paper, we create the first multimodal dataset of short video news covering hot events. We also propose an automatic technique for audio segmenting and transcribing. In addition, we improve the accuracy of the multimodal affective computing model by about 4.17% by optimizing it. Moreover, a novel system MSEVA for emotion analysis of short videos is proposed. Achieving good results on the bili-news dataset, the MSEVA system applies the multimodal emotion analysis method in the real world. It is helpful to conduct timely public opinion guidance and stop the spread of negative emotions. Data and code from our investigations can be accessed at: http://xxx.github.com.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Shuai Yang, Yuzhen Zhao and Yifang Ma “Analysis of the reasons and development of short video application-Taking Tik Tok as an example” In Proceedings of the 2019 9th International Conference on Information and Social Science (ICISS 2019), Manila, Philippines, 2019, pp. 12–14
  2. Qinglan Wei, Xuling Huang and Yuan Zhang “FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference” In IEEE Transactions on Broadcasting 69.1 IEEE, 2022, pp. 10–20
  3. “IEMOCAP: Interactive emotional dyadic motion capture database” In Language resources and evaluation 42 Springer, 2008, pp. 335–359
  4. “CMU-MOSEAS: A multimodal language dataset for Spanish, Portuguese, German and French” In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing 2020, 2020, pp. 1801 NIH Public Access
  5. “Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph” In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 2236–2246
  6. Amir Zadeh “Micro-opinion sentiment intensity analysis and summarization in online videos” In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, 2015, pp. 587–591
  7. “UR-FUNNY: A multimodal language dataset for understanding humor” In arXiv preprint arXiv:1904.06618, 2019
  8. “Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality” In Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 3718–3727
  9. “6 seconds of sound and vision: Creativity in micro-videos” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 4272–4279
  10. “Shorter-is-better: Venue category estimation from micro-video” In Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 1415–1424
  11. “Micro tells macro: Predicting the popularity of micro-videos via a transductive model” In Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 898–907
  12. “Mgat: Multimodal graph attention network for recommendation” In Information Processing & Management 57.5 Elsevier, 2020, pp. 102277
  13. “FakeSV: A multimodal benchmark with rich social context for fake news detection on short video platforms” In Proceedings of the AAAI Conference on Artificial Intelligence 37.12, 2023, pp. 14444–14452
  14. Hang-Bong Kang “Affective content detection using HMMs” In Proceedings of the eleventh ACM international conference on Multimedia, 2003, pp. 259–262
  15. Hee Lin Wang and Loong-Fah Cheong “Affective understanding in film” In IEEE Transactions on circuits and systems for video technology 16.6 IEEE, 2006, pp. 689–704
  16. “Youtube movie reviews: Sentiment analysis in an audio-visual context” In IEEE Intelligent Systems 28.3 IEEE, 2013, pp. 46–53
  17. “Ensemble application of ELM and GPU for real-time multimodal sentiment analysis” In Memetic Computing 10 Springer, 2018, pp. 3–13
  18. Brendan Jou, Subhabrata Bhattacharya and Shih-Fu Chang “Predicting viewer perceived emotions in animated GIFs” In Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 213–216
  19. Zhengyuan Yang, Yixuan Zhang and Jiebo Luo “Human-centered emotion recognition in animated gifs” In 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019, pp. 1090–1095 IEEE
  20. “An end-to-end visual-audio attention network for emotion recognition in user-generated videos” In Proceedings of the AAAI Conference on Artificial Intelligence 34.01, 2020, pp. 303–311
  21. “Multimodal end-to-end sparse model for emotion recognition” In arXiv preprint arXiv:2103.09666, 2021
  22. “Robust speech recognition via large-scale weak supervision” In International Conference on Machine Learning, 2023, pp. 28492–28518 PMLR
  23. “Roberta: A robustly optimized bert pretraining approach” In arXiv preprint arXiv:1907.11692, 2019
  24. “Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification” In Tsinghua Science and Technology 27.4 TUP, 2021, pp. 664–679
  25. Sanghyun Lee, David K Han and Hanseok Ko “Multimodal emotion recognition fusion analysis adapting BERT with heterogeneous feature unification” In IEEE Access 9 IEEE, 2021, pp. 94557–94572
  26. “AMOA: Global acoustic feature enhanced modal-order-aware network for multimodal sentiment analysis” In Proceedings of the 29th International Conference on Computational Linguistics, 2022, pp. 7136–7146
  27. “A multimodal corpus for emotion recognition in sarcasm” In arXiv preprint arXiv:2206.02119, 2022
  28. “QAP: A Quantum-Inspired Adaptive-Priority-Learning Model for Multimodal Emotion Recognition” In Findings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 12191–12204
  29. “TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis” In Pattern Recognition 136 Elsevier, 2023, pp. 109259
  30. “Sentiment Analysis in the Era of Large Language Models: A Reality Check” In arXiv preprint arXiv:2305.15005, 2023
  31. “Albert: A lite bert for self-supervised learning of language representations” In arXiv preprint arXiv:1909.11942, 2019
  32. “Language models are unsupervised multitask learners” In OpenAI blog 1.8, 2019, pp. 9
  33. “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension” In arXiv preprint arXiv:1910.13461, 2019
  34. “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter” In arXiv preprint arXiv:1910.01108, 2019
  35. “Crosslingual generalization through multitask finetuning” In arXiv preprint arXiv:2211.01786, 2022
  36. “Scaling instruction-finetuned language models” In arXiv preprint arXiv:2210.11416, 2022
Citations (2)

Summary

We haven't generated a summary for this paper yet.