Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fine-tuning multilingual language models in Twitter/X sentiment analysis: a study on Eastern-European V4 languages (2408.02044v1)

Published 4 Aug 2024 in cs.CL and cs.AI

Abstract: The aspect-based sentiment analysis (ABSA) is a standard NLP task with numerous approaches and benchmarks, where LLMs (LLM) represent the current state-of-the-art. We focus on ABSA subtasks based on Twitter/X data in underrepresented languages. On such narrow tasks, small tuned LLMs can often outperform universal large ones, providing available and cheap solutions. We fine-tune several LLMs (BERT, BERTweet, Llama2, Llama3, Mistral) for classification of sentiment towards Russia and Ukraine in the context of the ongoing military conflict. The training/testing dataset was obtained from the academic API from Twitter/X during 2023, narrowed to the languages of the V4 countries (Czech Republic, Slovakia, Poland, Hungary). Then we measure their performance under a variety of settings including translations, sentiment targets, in-context learning and more, using GPT4 as a reference model. We document several interesting phenomena demonstrating, among others, that some models are much better fine-tunable on multilingual Twitter tasks than others, and that they can reach the SOTA level with a very small training set. Finally we identify combinations of settings providing the best results.

Citations (1)

Summary

We haven't generated a summary for this paper yet.