Ensemble Language Models for Multilingual Sentiment Analysis (2403.06060v1)
Abstract: The rapid advancement of social media enables us to analyze user opinions. In recent times, sentiment analysis has shown a prominent research gap in understanding human sentiment based on the content shared on social media. Although sentiment analysis for commonly spoken languages has advanced significantly, low-resource languages like Arabic continue to get little research due to resource limitations. In this study, we explore sentiment analysis on tweet texts from SemEval-17 and the Arabic Sentiment Tweet dataset. Moreover, We investigated four pretrained LLMs and proposed two ensemble LLMs. Our findings include monolingual models exhibiting superior performance and ensemble models outperforming the baseline while the majority voting ensemble outperforms the English language.
- Arabert: Transformer-based model for arabic language understanding. In LREC 2020 Workshop Language Resources and Evaluation Conference 11–16 May 2020, page 9.
- Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116, 2019.
- Survey on sentiment analysis: evolution of research methods and topics. Artificial Intelligence Review, pages 1–42, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Sentiment analysis of twitter data using machine learning approaches and semantic analysis. In 2014 Seventh International Conference on Contemporary Computing (IC3), pages 437–442, 2014.
- Akshi Kumar and Victor Hugo C. Albuquerque. Sentiment analysis using xlm-r transformer and zero-shot transfer learning on resource-poor indian language. ACM Trans. Asian Low-Resour. Lang. Inf. Process., 20(5), jun 2021.
- An improved aspect-category sentiment analysis model for text sentiment analysis based on roberta. Applied Intelligence, 51:3522–3533, 2021.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data. Neural Computing and Applications, pages 1–17, 2023.
- Astd: Arabic sentiment tweets dataset. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 2515–2519, 2015.
- Semeval-2017 task 4: Sentiment analysis in twitter. arXiv preprint arXiv:1912.00741, 2019.
- Anshul Wadhawan. Arabert and farasa segmentation based approach for sarcasm and sentiment detection in arabic tweets. arXiv preprint arXiv:2103.01679, 2021.