Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Open-Source LLMs for Text Annotation: A Practical Guide for Model Setting and Fine-Tuning (2307.02179v2)

Published 5 Jul 2023 in cs.CL

Abstract: This paper studies the performance of open-source LLMs in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets. Our analysis shows that fine-tuning improves the performance of open-source LLMs, allowing them to match or even surpass zero-shot GPT-3.5 and GPT-4, though still lagging behind fine-tuned GPT-3.5. We further establish that fine-tuning is preferable to few-shot training with a relatively modest quantity of annotated text. Our findings show that fine-tuned open-source LLMs can be effectively deployed in a broad spectrum of text annotation applications. We provide a Python notebook facilitating the application of LLMs in text annotation for other researchers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. “Content Moderation As a Political Issue: The Twitter Discourse Around Trump’s Ban.” Journal of Quantitative Description: Digital Media 2.
  2. The media frames corpus: Annotations of frames across issues. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). pp. 438–444.
  3. “Scaling instruction-finetuned language models.” arXiv preprint arXiv:2210.11416 .
  4. Is GPT-3 a Good Data Annotator? In Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics.
  5. “Chatgpt outperforms crowd-workers for text-annotation tasks.” arXiv preprint arXiv:2303.15056 .
  6. “Using ChatGPT to Fight Misinformation: ChatGPT Nails 72% of 12,000 Verified Claims.”.
  7. “Large language models are zero-shot reasoners.” arXiv preprint arXiv:2205.11916 .
  8. Murdoch, Blake. 2021. “Privacy and artificial intelligence: challenges for protecting health information in a new era.” BMC Medical Ethics 22(1):1–5.
  9. “Digitization of healthcare sector: A study on privacy and security concerns.” ICT Express .
  10. Ray, Partha Pratim. 2023. “ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope.” Internet of Things and Cyber-Physical Systems .
  11. Spirling, Arthur. 2023. “Why open-source generative AI models are an ethical way forward for science.” Nature 616(7957):413–413.
  12. Törnberg, Petter. 2023. “ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning.”.
  13. “ChatGPT: five priorities for research.” Nature 614(7947):224–226.
  14. “Chain of thought prompting elicits reasoning in large language models.” arXiv preprint arXiv:2201.11903 .
  15. “Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks.”.
  16. “Can Large Language Models Transform Computational Social Science?” arXiv preprint arXiv:2305.03514 .
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Meysam Alizadeh (7 papers)
  2. Maël Kubli (16 papers)
  3. Zeynab Samei (4 papers)
  4. Shirin Dehghani (2 papers)
  5. Juan Diego Bermeo (2 papers)
  6. Maria Korobeynikova (1 paper)
  7. Fabrizio Gilardi (10 papers)
  8. Mohammadmasiha Zahedivafa (1 paper)