Are Large Language Models (LLMs) Good Social Predictors? (2402.12620v1)
Abstract: The prediction has served as a crucial scientific method in modern social studies. With the recent advancement of LLMs, efforts have been made to leverage LLMs to predict the human features in social life, such as presidential voting. These works suggest that LLMs are capable of generating human-like responses. However, we find that the promising performance achieved by previous studies is because of the existence of input shortcut features to the response. In fact, by removing these shortcuts, the performance is reduced dramatically. To further revisit the ability of LLMs, we introduce a novel social prediction task, Soc-PRF Prediction, which utilizes general features as input and simulates real-world social study settings. With the comprehensive investigations on various LLMs, we reveal that LLMs cannot work as expected on social prediction when given general input features without shortcuts. We further investigate possible reasons for this phenomenon that suggest potential ways to enhance LLMs for social prediction.
- Josh Achiam et al. 2023. Gpt-4 technical report.
- Using large language models to simulate multiple humans and replicate human subject studies. In International Conference on Machine Learning, pages 337–371. PMLR.
- Rubayyi Alghamdi and Khalid Alfalqi. 2015. A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl.(IJACSA), 6(1).
- ANES. User’s guide and codebook for the anes 2012 time series study.
- Rohan Anil et al. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
- Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis, 31:337 – 351.
- Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3):337–351.
- Susan Athey. 2018. The impact of machine learning on economics. In The economics of artificial intelligence: An agenda, pages 507–547. University of Chicago Press.
- Kenneth D Bailey. 1992. Globals, mutables, and immutables: a new look at the micro-macro link. Quality and Quantity, 26(3):259–276.
- Kenneth D Bailey. 1998. A theory of mutable and immutable characteristics: Their impact on allocation and structural positions. Quality and Quantity, 32(4):383–398.
- Everyone’s an influencer: quantifying influence on twitter. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 65–74.
- Jordan Brensinger and Gil Eyal. 2021. The sociology of personal identification. Sociological Theory, 39(4):265–292.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Online panel research: A data quality perspective. John Wiley & Sons.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
- Social prediction: a new research paradigm based on machine learning. The Journal of Chinese Sociology, 8:1–21.
- Can cascades be predicted? In Proceedings of the 23rd international conference on World wide web, pages 925–936.
- Mick P Couper. 2017. New developments in survey data collection. Annual review of sociology, 43:121–145.
- Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712.
- Russell J Dalton. 2016. Party identification and its implications. Oxford research encyclopedia of politics.
- Online and social media data as an imperfect continuous panel survey. PloS one, 11(1):e0145406.
- Can ai language models replace human participants? Trends in Cognitive Sciences.
- A survey for in-context learning. arXiv preprint arXiv:2301.00234.
- Cnn-svm with embedded recurrent structure for social emotion prediction. In 2018 Chinese Automation Congress (CAC), pages 3024–3029. IEEE.
- G Gallup. 2009. World poll methodology. Technical report, Technical Report, Washington, DC.
- Evaluation measures of the classification performance of imbalanced data sets. In Computational Intelligence and Intelligent Systems: 4th International Symposium, ISICA 2009, Huangshi, China, October 23-25, 2009. Proceedings 4, pages 461–471. Springer.
- Janet E Halley. 2017. Sexual orientation and the politics of biology: A critique of the argument from immutability. In Sexual orientation and rights, pages 3–68. Routledge.
- Matthew Hindman. 2015. Building better models: Prediction, replication, and machine learning in the social sciences. The Annals of the American Academy of Political and Social Science, 659(1):48–62.
- Prediction and explanation in social systems. Science, 355(6324):486–488.
- Social media prediction based on residual learning and random forest. In Proceedings of the 25th ACM international conference on Multimedia, pages 1865–1870.
- Doaa Mohey El-Din Mohamed Hussein. 2018. A survey on sentiment analysis challenges. Journal of King Saud University-Engineering Sciences, 30(4):330–338.
- Employing large language models in survey research. Natural Language Processing Journal.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Junsol Kim and Byungkyu Lee. 2023. Ai-augmented surveys: Leveraging large language models for opinion prediction in nationally representative surveys. arXiv preprint arXiv:2305.09620.
- David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. In Proceedings of the twelfth international conference on Information and knowledge management, pages 556–559.
- Social-aware sequential modeling of user interests: A deep learning approach. IEEE Transactions on Knowledge and Data Engineering, 31(11):2200–2212.
- Adrian Mackenzie. 2015. The production of prediction: What does machine learning want? European Journal of Cultural Studies, 18(4-5):429–445.
- Uwe Messer and Stefan Fausser. 2019. Predicting social perception from faces: A deep learning approach. arXiv preprint arXiv:1907.00217.
- Warren E Miller. 1991. Party identification, realignment, and party voting: Back to the basics. American Political Science Review, 85(2):557–568.
- Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474.
- OpenAI. 2022. Openai chatgpt.
- Steve Phelps and Yvan I Russell. 2023. Investigating emergent goal-like behaviour in large language models using experimental economics. arXiv preprint arXiv:2305.07970.
- Applied machine learning in social sciences: neural networks and crime prediction. Social Sciences, 10(1):4.
- How accurate are gpt-3’s hypotheses about social science phenomena? Digital Society, 2(2):26.
- Maya Sen and Omar Wasow. 2016. Race as a bundle of sticks: Designs that estimate effects of seemingly immutable characteristics. Annual Review of Political Science, 19:499–522.
- Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2998–3009.
- Can chatgpt replace traditional kbqa models? an in-depth analysis of the question answering performance of the gpt llm family. In International Semantic Web Conference, pages 348–367. Springer.
- Petter Törnberg. 2023. Chatgpt-4 outperforms experts and crowd workers in annotating political twitter messages with zero-shot learning. arXiv preprint arXiv:2304.06588.
- Simulating social media using large language models to evaluate alternative news feed algorithms. arXiv preprint arXiv:2310.05984.
- The gallup world poll. Survey methods in multinational, multiregional, and multicultural contexts, pages 535–543.
- Hugo Touvron et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Gülden Kaya Uyanık and Neşe Güler. 2013. A study on multiple linear regression analysis. Procedia-Social and Behavioral Sciences, 106:234–240.
- Ike Vayansky and Sathish AP Kumar. 2020. A review of topic modeling methods. Information Systems, 94:101582.
- Generating faithful synthetic data with large language models: A case study in computational social science. arXiv preprint arXiv:2305.15041.
- Assessing bias in llm-generated synthetic datasets: The case of german voter behavior. Technical report, Center for Open Science.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Hybrid deep sequential modeling for social text-driven stock prediction. In Proceedings of the 27th ACM international conference on information and knowledge management, pages 1627–1630.
- Large language models can be used to estimate the ideologies of politicians in a zero-shot learning setting. arXiv preprint arXiv:2303.12057.
- A survey of sentiment analysis in social media. Knowledge and Information Systems, 60:617–663.
- Trained transformers learn linear models in-context. arXiv preprint arXiv:2306.09927.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625.
- Toolqa: A dataset for llm question answering with external tools. arXiv preprint arXiv:2306.13304.
- Can large language models transform computational social science? arXiv preprint arXiv:2305.03514.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.