Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Human-Level Text Coding with LLMs: The Case of Fatherhood Roles in Public Policy Documents (2311.11844v3)

Published 20 Nov 2023 in cs.CL and cs.CY

Abstract: Recent advances in LLMs like GPT-3.5 and GPT-4 promise automation with better results and less programming, opening up new opportunities for text analysis in political science. In this study, we evaluate LLMs on three original coding tasks involving typical complexities encountered in political science settings: a non-English language, legal and political jargon, and complex labels based on abstract constructs. Along the paper, we propose a practical workflow to optimize the choice of the model and the prompt. We find that the best prompting strategy consists of providing the LLMs with a detailed codebook, as the one provided to human coders. In this setting, an LLM can be as good as or possibly better than a human annotator while being much faster, considerably cheaper, and much easier to scale to large amounts of text. We also provide a comparison of GPT and popular open-source LLMs, discussing the trade-offs in the model's choice. Our software allows LLMs to be easily used as annotators and is publicly available: https://github.com/lorelupo/pappa.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Gavin Abercrombie, Dirk Hovy and Vinodkumar Prabhakaran “Temporal and Second Language Influence on Intra-Annotator Agreement and Stability in Hate Speech Labelling” In Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII) Toronto, Canada: Association for Computational Linguistics, 2023, pp. 96–103
  2. Gati V. Aher, Rosa I. Arriaga and Adam Tauman Kalai “Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies” ISSN: 2640-3498 In Proceedings of the 40th International Conference on Machine Learning PMLR, 2023, pp. 337–371
  3. “Mega: Multilingual evaluation of generative ai” In arXiv preprint arXiv:2303.12528, 2023
  4. L Jason Anastasopoulos and Anthony M Bertelli “Understanding delegation through machine learning: A method and application to the European Union” In American Political Science Review 114.1 Cambridge University Press, 2020, pp. 291–301
  5. “Out of One, Many: Using Language Models to Simulate Human Samples” Publisher: Cambridge University Press In Political Analysis, 2023, pp. 1–15 DOI: 10.1017/pan.2023.2
  6. Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio “Neural machine translation by jointly learning to align and translate” In arXiv preprint arXiv:1409.0473, 2014
  7. Ken Benoit “Text as data: An overview” In The SAGE handbook of research methods in political science and international relations Sage London, 2020, pp. 461–497
  8. “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
  9. Stanley F Chen and Joshua Goodman “An empirical study of smoothing techniques for language modeling” In Computer Speech & Language 13.4 Elsevier, 1999, pp. 359–394
  10. “Empirical evaluation of gated recurrent neural networks on sequence modeling” In arXiv preprint arXiv:1412.3555, 2014
  11. “Language modeling with gated convolutional networks” In International conference on machine learning, 2017, pp. 933–941 PMLR
  12. “More than the sum of its parts? Contemporary fatherhood policy, practice and discourse” In Families, Relationships and Societies 4.2 Policy Press, 2015, pp. 183–195
  13. “Can AI language models replace human participants?” Publisher: Elsevier In Trends in Cognitive Sciences 27.7, 2023, pp. 597–600 DOI: 10.1016/j.tics.2023.04.008
  14. “Do Multilingual Language Models Think Better in English?” In arXiv preprint arXiv:2308.01223, 2023
  15. “From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Toronto, Canada: Association for Computational Linguistics, 2023, pp. 11737–11762 DOI: 10.18653/v1/2023.acl-long.656
  16. “Understanding the difficulty of training deep feedforward neural networks” In Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249–256 JMLR WorkshopConference Proceedings
  17. Xavier Glorot, Antoine Bordes and Yoshua Bengio “Deep sparse rectifier neural networks” In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 315–323 JMLR WorkshopConference Proceedings
  18. “Time Travel in LLMs: Tracing Data Contamination in Large Language Models” In arXiv preprint arXiv:2308.08493, 2023
  19. Tatjana Elke Graf and Katarzyna Wojnicka “Post-separation fatherhood narratives in Germany and Sweden: Between caring and protective masculinities” In Journal of Family Studies 29.3 Taylor & Francis, 2023, pp. 1022–1042
  20. “What is “new” about fatherhood? The social construction of fatherhood in France and the UK” In Men and masculinities 14.5 Sage Publications Sage CA: Los Angeles, CA, 2011, pp. 588–606
  21. Justin Grimmer “A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases” In Political Analysis 18.1 Cambridge University Press, 2010, pp. 1–35
  22. Justin Grimmer, Margaret E Roberts and Brandon M Stewart “Machine learning for social science: An agnostic approach” In Annual Review of Political Science 24 Annual Reviews, 2021, pp. 395–419
  23. Elizabeth L Haines and Steven J Stroessner “The role prioritization model: How communal men and agentic women can (sometimes) have it all” In Social and Personality Psychology Compass 13.12 Wiley Online Library, 2019, pp. e12504
  24. Sepp Hochreiter “Untersuchungen zu dynamischen neuronalen Netzen” In Diploma, Technische Universität München 91.1, 1991, pp. 31
  25. “Long short-term memory” In Neural computation 9.8 MIT press, 1997, pp. 1735–1780
  26. John J. Horton “Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?”, Working Paper Series National Bureau of Economic Research, 2023 DOI: 10.3386/w31122
  27. “Conflibert: A pre-trained language model for political conflict and violence” In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022, pp. 5469–5482
  28. Yoon Kim “Convolutional Neural Networks for Sentence Classification” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Doha, Qatar: Association for Computational Linguistics, 2014, pp. 1746–1751 DOI: 10.3115/v1/D14-1181
  29. Anton Korinek “Language Models and Cognitive Automation for Economic Research”, Working Paper Series National Bureau of Economic Research, 2023 DOI: 10.3386/w30957
  30. J Richard Landis and Gary G Koch “The measurement of observer agreement for categorical data” In biometrics JSTOR, 1977, pp. 159–174
  31. “Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI” In Political Analysis, 2023, pp. 1–17 DOI: 10.1017/pan.2023.20
  32. OpenAI “GPT-4 Technical Report”, 2023 arXiv:2303.08774 [cs.CL]
  33. Joseph T Ornstein, Elise N Blasingame and Jake S Truscott “How to train your stochastic parrot: Large language models for political texts”, 2022
  34. Moritz Osnabrügge, Elliott Ash and Massimo Morelli “Cross-domain topic classification for political texts” In Political Analysis 31.1 Cambridge University Press, 2023, pp. 59–80
  35. “Toward a resource theory of fathering” In Journal of Family Theory & Review 10.1 Wiley Online Library, 2018, pp. 181–198
  36. “Improving language understanding by generative pre-training” OpenAI, 2018
  37. “Language models are unsupervised multitask learners” In OpenAI blog 1.8, 2019, pp. 9
  38. Margaret E Roberts, Brandon M Stewart and Edoardo M Airoldi “A model of text for experimentation in the social sciences” In Journal of the American Statistical Association 111.515 Taylor & Francis, 2016, pp. 988–1003
  39. Frank Rosenblatt “The perceptron: a probabilistic model for information storage and organization in the brain.” In Psychological review 65.6 American Psychological Association, 1958, pp. 386
  40. Hannes Rosenbusch, Claire E Stevenson and Han LJ Maas “How Accurate are GPT-3’s Hypotheses About Social Science Phenomena?” In Digital Society 2.2 Springer, 2023, pp. 26
  41. Robert R Schaller “Moore’s law: past, present and future” In IEEE spectrum 34.6 IEEE, 1997, pp. 52–59
  42. Carolin Scheifele, Melanie C Steffens and Colette Van Laar “Which representations of their gender group affect men’s orientation towards care? the case of parental leave-taking intentions” In Plos one 16.12 Public Library of Science San Francisco, CA USA, 2021, pp. e0260950
  43. Jonathan B Slapin and Sven-Oliver Proksch “A scaling model for estimating time-series party positions from texts” In American Journal of Political Science 52.3 Wiley Online Library, 2008, pp. 705–722
  44. “LLaMA: Open and Efficient Foundation Language Models”, 2023 arXiv:2302.13971 [cs.CL]
  45. “Attention is all you need” In Advances in neural information processing systems 30, 2017
  46. Yu Wang “Topic Classification for Political Texts with Pretrained Language Models” In Political Analysis Cambridge University Press, 2023, pp. 1–7 DOI: 10.1017/pan.2023.3
  47. “Calibrate before use: Improving few-shot performance of language models” In International Conference on Machine Learning, 2021, pp. 12697–12706 PMLR
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Lorenzo Lupo (5 papers)
  2. Oscar Magnusson (1 paper)
  3. Dirk Hovy (57 papers)
  4. Elin Naurin (1 paper)
  5. Lena Wängnerud (1 paper)
Citations (1)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com