Towards Human-Level Text Coding with LLMs: The Case of Fatherhood Roles in Public Policy Documents (2311.11844v3)
Abstract: Recent advances in LLMs like GPT-3.5 and GPT-4 promise automation with better results and less programming, opening up new opportunities for text analysis in political science. In this study, we evaluate LLMs on three original coding tasks involving typical complexities encountered in political science settings: a non-English language, legal and political jargon, and complex labels based on abstract constructs. Along the paper, we propose a practical workflow to optimize the choice of the model and the prompt. We find that the best prompting strategy consists of providing the LLMs with a detailed codebook, as the one provided to human coders. In this setting, an LLM can be as good as or possibly better than a human annotator while being much faster, considerably cheaper, and much easier to scale to large amounts of text. We also provide a comparison of GPT and popular open-source LLMs, discussing the trade-offs in the model's choice. Our software allows LLMs to be easily used as annotators and is publicly available: https://github.com/lorelupo/pappa.
- Gavin Abercrombie, Dirk Hovy and Vinodkumar Prabhakaran “Temporal and Second Language Influence on Intra-Annotator Agreement and Stability in Hate Speech Labelling” In Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII) Toronto, Canada: Association for Computational Linguistics, 2023, pp. 96–103
- Gati V. Aher, Rosa I. Arriaga and Adam Tauman Kalai “Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies” ISSN: 2640-3498 In Proceedings of the 40th International Conference on Machine Learning PMLR, 2023, pp. 337–371
- “Mega: Multilingual evaluation of generative ai” In arXiv preprint arXiv:2303.12528, 2023
- L Jason Anastasopoulos and Anthony M Bertelli “Understanding delegation through machine learning: A method and application to the European Union” In American Political Science Review 114.1 Cambridge University Press, 2020, pp. 291–301
- “Out of One, Many: Using Language Models to Simulate Human Samples” Publisher: Cambridge University Press In Political Analysis, 2023, pp. 1–15 DOI: 10.1017/pan.2023.2
- Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio “Neural machine translation by jointly learning to align and translate” In arXiv preprint arXiv:1409.0473, 2014
- Ken Benoit “Text as data: An overview” In The SAGE handbook of research methods in political science and international relations Sage London, 2020, pp. 461–497
- “Language models are few-shot learners” In Advances in neural information processing systems 33, 2020, pp. 1877–1901
- Stanley F Chen and Joshua Goodman “An empirical study of smoothing techniques for language modeling” In Computer Speech & Language 13.4 Elsevier, 1999, pp. 359–394
- “Empirical evaluation of gated recurrent neural networks on sequence modeling” In arXiv preprint arXiv:1412.3555, 2014
- “Language modeling with gated convolutional networks” In International conference on machine learning, 2017, pp. 933–941 PMLR
- “More than the sum of its parts? Contemporary fatherhood policy, practice and discourse” In Families, Relationships and Societies 4.2 Policy Press, 2015, pp. 183–195
- “Can AI language models replace human participants?” Publisher: Elsevier In Trends in Cognitive Sciences 27.7, 2023, pp. 597–600 DOI: 10.1016/j.tics.2023.04.008
- “Do Multilingual Language Models Think Better in English?” In arXiv preprint arXiv:2308.01223, 2023
- “From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Toronto, Canada: Association for Computational Linguistics, 2023, pp. 11737–11762 DOI: 10.18653/v1/2023.acl-long.656
- “Understanding the difficulty of training deep feedforward neural networks” In Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249–256 JMLR WorkshopConference Proceedings
- Xavier Glorot, Antoine Bordes and Yoshua Bengio “Deep sparse rectifier neural networks” In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 2011, pp. 315–323 JMLR WorkshopConference Proceedings
- “Time Travel in LLMs: Tracing Data Contamination in Large Language Models” In arXiv preprint arXiv:2308.08493, 2023
- Tatjana Elke Graf and Katarzyna Wojnicka “Post-separation fatherhood narratives in Germany and Sweden: Between caring and protective masculinities” In Journal of Family Studies 29.3 Taylor & Francis, 2023, pp. 1022–1042
- “What is “new” about fatherhood? The social construction of fatherhood in France and the UK” In Men and masculinities 14.5 Sage Publications Sage CA: Los Angeles, CA, 2011, pp. 588–606
- Justin Grimmer “A Bayesian hierarchical topic model for political texts: Measuring expressed agendas in Senate press releases” In Political Analysis 18.1 Cambridge University Press, 2010, pp. 1–35
- Justin Grimmer, Margaret E Roberts and Brandon M Stewart “Machine learning for social science: An agnostic approach” In Annual Review of Political Science 24 Annual Reviews, 2021, pp. 395–419
- Elizabeth L Haines and Steven J Stroessner “The role prioritization model: How communal men and agentic women can (sometimes) have it all” In Social and Personality Psychology Compass 13.12 Wiley Online Library, 2019, pp. e12504
- Sepp Hochreiter “Untersuchungen zu dynamischen neuronalen Netzen” In Diploma, Technische Universität München 91.1, 1991, pp. 31
- “Long short-term memory” In Neural computation 9.8 MIT press, 1997, pp. 1735–1780
- John J. Horton “Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus?”, Working Paper Series National Bureau of Economic Research, 2023 DOI: 10.3386/w31122
- “Conflibert: A pre-trained language model for political conflict and violence” In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022, pp. 5469–5482
- Yoon Kim “Convolutional Neural Networks for Sentence Classification” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Doha, Qatar: Association for Computational Linguistics, 2014, pp. 1746–1751 DOI: 10.3115/v1/D14-1181
- Anton Korinek “Language Models and Cognitive Automation for Economic Research”, Working Paper Series National Bureau of Economic Research, 2023 DOI: 10.3386/w30957
- J Richard Landis and Gary G Koch “The measurement of observer agreement for categorical data” In biometrics JSTOR, 1977, pp. 159–174
- “Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI” In Political Analysis, 2023, pp. 1–17 DOI: 10.1017/pan.2023.20
- OpenAI “GPT-4 Technical Report”, 2023 arXiv:2303.08774 [cs.CL]
- Joseph T Ornstein, Elise N Blasingame and Jake S Truscott “How to train your stochastic parrot: Large language models for political texts”, 2022
- Moritz Osnabrügge, Elliott Ash and Massimo Morelli “Cross-domain topic classification for political texts” In Political Analysis 31.1 Cambridge University Press, 2023, pp. 59–80
- “Toward a resource theory of fathering” In Journal of Family Theory & Review 10.1 Wiley Online Library, 2018, pp. 181–198
- “Improving language understanding by generative pre-training” OpenAI, 2018
- “Language models are unsupervised multitask learners” In OpenAI blog 1.8, 2019, pp. 9
- Margaret E Roberts, Brandon M Stewart and Edoardo M Airoldi “A model of text for experimentation in the social sciences” In Journal of the American Statistical Association 111.515 Taylor & Francis, 2016, pp. 988–1003
- Frank Rosenblatt “The perceptron: a probabilistic model for information storage and organization in the brain.” In Psychological review 65.6 American Psychological Association, 1958, pp. 386
- Hannes Rosenbusch, Claire E Stevenson and Han LJ Maas “How Accurate are GPT-3’s Hypotheses About Social Science Phenomena?” In Digital Society 2.2 Springer, 2023, pp. 26
- Robert R Schaller “Moore’s law: past, present and future” In IEEE spectrum 34.6 IEEE, 1997, pp. 52–59
- Carolin Scheifele, Melanie C Steffens and Colette Van Laar “Which representations of their gender group affect men’s orientation towards care? the case of parental leave-taking intentions” In Plos one 16.12 Public Library of Science San Francisco, CA USA, 2021, pp. e0260950
- Jonathan B Slapin and Sven-Oliver Proksch “A scaling model for estimating time-series party positions from texts” In American Journal of Political Science 52.3 Wiley Online Library, 2008, pp. 705–722
- “LLaMA: Open and Efficient Foundation Language Models”, 2023 arXiv:2302.13971 [cs.CL]
- “Attention is all you need” In Advances in neural information processing systems 30, 2017
- Yu Wang “Topic Classification for Political Texts with Pretrained Language Models” In Political Analysis Cambridge University Press, 2023, pp. 1–7 DOI: 10.1017/pan.2023.3
- “Calibrate before use: Improving few-shot performance of language models” In International Conference on Machine Learning, 2021, pp. 12697–12706 PMLR
- Lorenzo Lupo (5 papers)
- Oscar Magnusson (1 paper)
- Dirk Hovy (57 papers)
- Elin Naurin (1 paper)
- Lena Wängnerud (1 paper)