Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 101 tok/s Pro
Kimi K2 179 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Automated Social Science: Language Models as Scientist and Subjects (2404.11794v2)

Published 17 Apr 2024 in econ.GN and q-fin.EC

Abstract: We present an approach for automatically generating and testing, in silico, social scientific hypotheses. This automation is made possible by recent advances in LLMs (LLM), but the key feature of the approach is the use of structural causal models. Structural causal models provide a language to state hypotheses, a blueprint for constructing LLM-based agents, an experimental design, and a plan for data analysis. The fitted structural causal model becomes an object available for prediction or the planning of follow-on experiments. We demonstrate the approach with several scenarios: a negotiation, a bail hearing, a job interview, and an auction. In each case, causal relationships are both proposed and tested by the system, finding evidence for some and not others. We provide evidence that the insights from these simulations of social interactions are not available to the LLM purely through direct elicitation. When given its proposed structural causal model for each scenario, the LLM is good at predicting the signs of estimated effects, but it cannot reliably predict the magnitudes of those estimates. In the auction experiment, the in silico simulation results closely match the predictions of auction theory, but elicited predictions of the clearing prices from the LLM are inaccurate. However, the LLM's predictions are dramatically improved if the model can condition on the fitted structural causal model. In short, the LLM knows more than it can (immediately) tell.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Aher, Gati V, Rosa I Arriaga, and Adam Tauman Kalai, “Using large language models to simulate multiple humans and replicate human subject studies,” in “International Conference on Machine Learning” PMLR 2023, pp. 337–371.
  2. Almaatouq, Abdullah, Thomas L. Griffiths, Jordan W. Suchow, Mark E. Whiting, James Evans, and Duncan J. Watts, “Beyond Playing 20 Questions with Nature: Integrative Experiment Design in the Social and Behavioral Sciences,” Behavioral and Brain Sciences, 2022, p. 1–55.
  3. Argyle, Lisa P, Ethan C Busby, Nancy Fulda, Joshua R Gubler, Christopher Rytting, and David Wingate, “Out of one, many: Using language models to simulate human samples,” Political Analysis, 2023, 31 (3), 337–351.
  4. Atari, M., M. J. Xue, P. S. Park, D. E. Blasi, and J. Henrich, “Which Humans?,” Technical Report 09 2023. https://doi.org/10.31234/osf.io/5b26t.
  5. Athey, Susan, Jonathan Levin, and Enrique Seira, “Comparing open and Sealed Bid Auctions: Evidence from Timber Auctions*,” The Quarterly Journal of Economics, 02 2011, 126 (1), 207–257.
  6. Bakker, Michiel, Martin Chadwick, Hannah Sheahan, Michael Tessler, Lucy Campbell-Gillingham, Jan Balaguer, Nat McAleese, Amelia Glaese, John Aslanides, Matt Botvinick, and Christopher Summerfield, “Fine-tuning language models to find agreement among humans with diverse preferences,” in S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, eds., Advances in Neural Information Processing Systems, Vol. 35 Curran Associates, Inc. 2022, pp. 38176–38189.
  7. Binz, Marcel and Eric Schulz, “Turning large language models into cognitive models,” 2023.
  8.    and   , “Using cognitive psychology to understand GPT-3,” Proceedings of the National Academy of Sciences, 2023, 120 (6), e2218523120.
  9. Brand, James, Ayelet Israeli, and Donald Ngwe, “Using GPT for Market Research,” Working paper, 2023.
  10. Bubeck, Sébastien, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang, “Sparks of Artificial General Intelligence: Early experiments with GPT-4,” 2023.
  11. Burns, C, H Ye, D Klein, and J Steinhardt, “Discovering latent knowledge in language models without supervision,” in “International Conference on Learning Representations (ICLR)” 2023.
  12. Buyalskaya, Anastasia, Hung Ho, Katherine L. Milkman, Xiaomin Li, Angela L. Duckworth, and Colin Camerer, “What can machine learning teach us about habit formation? Evidence from exercise and hygiene,” Proceedings of the National Academy of Sciences, 2023, 120 (17), e2216115120.
  13. Cai, Alice, Steven R Rick, Jennifer L Heyman, Yanxia Zhang, Alexandre Filipowicz, Matthew Hong, Matt Klenk, and Thomas Malone, “DesignAID: Using Generative AI and Semantic Diversity for Design Inspiration,” in “Proceedings of The ACM Collective Intelligence Conference” CI ’23 Association for Computing Machinery New York, NY, USA 2023, p. 1–11.
  14. Camerer, Colin, Anna Dreber, Felix Holzmeister, Teck-Hua Ho, Jurgen Huber, Magnus Johannesson, Michael Kirchler, Gideon Nave, Brian A. Nosek, Thomas Pfeiffer, Adam Altmejd, Nick Buttrick, Taizan Chan, Yiling Chen, Eskil Forsell, Anup Gampa, Emma Heikensten, Lily Hummer, Taisuke Imai, Siri Isaksson, Dylan Manfredi, Julia Rose, Eric-Jan Wagenmakers, and Hang Wu, “Evaluating the Replicability of Social Science Experiments in Nature and Science between 2010 and 2015,” Nature Human Behaviour, Aug 2018, 2 (9), 637–644.
  15. Cheng, Myra, Tiziano Piccardi, and Diyi Yang, “CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations,” ArXiv, 2023, abs/2310.11501.
  16. Chickering, David Maxwell, “Optimal structure identification with greedy search,” Journal of machine learning research, 2002, 3 (Nov), 507–554.
  17. Cinelli, Carlos, Andrew Forney, and Judea Pearl, “A crash course in good and bad controls,” Sociological Methods & Research, 2022, p. 00491241221099552.
  18. Engzell, Per, “A universe of uncertainty hiding in plain sight,” Proceedings of the National Academy of Sciences, 2023, 120 (2), e2218530120.
  19. Enke, Benjamin and Cassidy Shubatt, “Quantifying Lottery Choice Complexity,” Working Paper 31677, National Bureau of Economic Research September 2023.
  20. Fish, Sara, Paul Gölz, David C Parkes, Ariel D Procaccia, Gili Rusak, Itai Shapira, and Manuel Wüthrich, “Generative Social Choice,” arXiv preprint arXiv:2309.01291, 2023.
  21. Girotra, Karan, Lennart Meincke, Christian Terwiesch, and Karl T Ulrich, “Ideas are dimes a dozen: Large language models for idea generation in innovation,” Available at SSRN 4526071, 2023.
  22. Gurnee, Wes and Max Tegmark, “Language Models Represent Space and Time,” 2023.
  23. Haavelmo, Trygve, “The statistical implications of a system of simultaneous equations,” Econometrica, Journal of the Econometric Society, 1943, pp. 1–12.
  24.   , “The probability approach in econometrics,” Econometrica: Journal of the Econometric Society, 1944, pp. iii–115.
  25. Handel, Benjamin and Joshua Schwartzstein, “Frictions or Mental Gaps: What’s Behind the Information We (Don’t) Use and When Do We Care?,” Journal of Economic Perspectives, February 2018, 32 (1), 155–178.
  26. Horton, John J, “Large language models as simulated economic agents: What can we learn from homo silicus?,” Technical Report, National Bureau of Economic Research 2023.
  27. Imai, Kosuke, Dustin Tingley, and Teppei Yamamoto, “Experimental Designs for Identifying Causal Mechanisms,” Journal of the Royal Statistical Society Series A: Statistics in Society, 11 2012, 176 (1), 5–51.
  28. Jahani, Eaman, Samuel P. Fraiberger, Michael Bailey, and Dean Eckles, “Long ties, disruptive life events, and economic prosperity,” Proceedings of the National Academy of Sciences, 2023, 120 (28), e2211062120.
  29. Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko et al., “Highly accurate protein structure prediction with AlphaFold,” Nature, 2021, 596 (7873), 583–589.
  30. Jöreskog, Karl G., “A GENERAL METHOD FOR ESTIMATING A LINEAR STRUCTURAL EQUATION SYSTEM*,” ETS Research Bulletin Series, 1970, 1970 (2), i–41.
  31. Lerner, Jennifer S., Deborah A. Small, and George Loewenstein, “Heart Strings and Purse Strings: Carryover Effects of Emotions on Economic Decisions,” Psychological Science, 2004, 15 (5), 337–341. PMID: 15102144.
  32. Li, Peiyao, Noah Castelo, Zsolt Katona, and Miklos Sarvary, “Frontiers: Determining the Validity of Large Language Models for Automated Perceptual Analysis,” Marketing Science, 2024, 0 (0), null.
  33. Ludwig, Jens and Sendhil Mullainathan, “Machine Learning as a Tool for Hypothesis Generation,” Working Paper 31017, National Bureau of Economic Research March 2023.
  34. Maskin, Eric S. and John G. Riley, “Auction Theory with Private Values,” The American Economic Review, 1985, 75 (2), 150–155.
  35. Mastroianni, Adam M., Daniel T. Gilbert, Gus Cooney, and Timothy D. Wilson, “Do conversations end when people want them to?,” Proceedings of the National Academy of Sciences, 2021, 118 (10), e2011809118.
  36. Mei, Qiaozhu, Yutong Xie, Walter Yuan, and Matthew O. Jackson, “A Turing test of whether AI chatbots are behaviorally similar to humans,” Proceedings of the National Academy of Sciences, 2024, 121 (9), e2313925121.
  37. Merchant, Amil, Simon Batzner, Samuel S Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk, “Scaling deep learning for materials discovery,” Nature, 2023, pp. 1–6.
  38. Mullainathan, Sendhil and Ashesh Rambachan, “From Predictive Algorithms to Automatic Generation of Anomalies,” Technical Report May 2023. Available at: https://ssrn.com/abstract=4443738 or http://dx.doi.org/10.2139/ssrn.4443738.
  39. Park, Joon Sung, Joseph C O’Brien, Carrie J Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein, “Generative agents: Interactive simulacra of human behavior,” arXiv preprint arXiv:2304.03442, 2023.
  40. Patel, R. and E. Pavlick, “Mapping language models to grounded conceptual spaces,” in “Proceedings of the International Conference on Learning Representations” 2021, p. 79.
  41. Pearl, Judea, “Causal inference in statistics: An overview,” Statistics Surveys, 2009, 3 (none), 96 – 146.
  42. Peterson, Joshua C., David D. Bourgin, Mayank Agrawal, Daniel Reichman, and Thomas L. Griffiths, “Using large-scale experiments and machine learning to discover theories of human decision-making,” Science, 2021, 372 (6547), 1209–1214.
  43. Rajkumar, Karthik, Guillaume Saint-Jacques, Iavor Bojinov, Erik Brynjolfsson, and Sinan Aral, “A causal test of the strength of weak ties,” Science, 2022, 377 (6612), 1304–1310.
  44. Rosenbusch, Hannes, Claire E. Stevenson, and Han L. J. van der Maas, “How Accurate are GPT-3’s Hypotheses About Social Science Phenomena?,” Digital Society, July 2023, 2 (2), 26.
  45. Rosseel, Yves, “lavaan: An R Package for Structural Equation Modeling,” Journal of Statistical Software, 2012, 48 (2), 1–36.
  46. Sacerdote, Bruce, “Peer Effects with Random Assignment: Results for Dartmouth Roommates*,” The Quarterly Journal of Economics, 05 2001, 116 (2), 681–704.
  47. Salganik, Matthew J., Ian Lundberg, Alexander T. Kindel, Caitlin E. Ahearn, Khaled Al-Ghoneim, Abdullah Almaatouq, Drew M. Altschul, Jennie E. Brand, Nicole Bohme Carnegie, Ryan James Compton, Debanjan Datta, Thomas Davidson, Anna Filippova, Connor Gilroy, Brian J. Goode, Eaman Jahani, Ridhi Kashyap, Antje Kirchner, Stephen McKay, Allison C. Morgan, Alex Pentland, Kivan Polimis, Louis Raes, Daniel E. Rigobon, Claudia V. Roberts, Diana M. Stanescu, Yoshihiko Suhara, Adaner Usmani, Erik H. Wang, Muna Adem, Abdulla Alhajri, Bedoor AlShebli, Redwane Amin, Ryan B. Amos, Lisa P. Argyle, Livia Baer-Bositis, Moritz Büchi, Bo-Ryehn Chung, William Eggert, Gregory Faletto, Zhilin Fan, Jeremy Freese, Tejomay Gadgil, Josh Gagné, Yue Gao, Andrew Halpern-Manners, Sonia P. Hashim, Sonia Hausen, Guanhua He, Kimberly Higuera, Bernie Hogan, Ilana M. Horwitz, Lisa M. Hummel, Naman Jain, Kun Jin, David Jurgens, Patrick Kaminski, Areg Karapetyan, E. H. Kim, Ben Leizman, Naijia Liu, Malte Möser, Andrew E. Mack, Mayank Mahajan, Noah Mandell, Helge Marahrens, Diana Mercado-Garcia, Viola Mocz, Katariina Mueller-Gastell, Ahmed Musse, Qiankun Niu, William Nowak, Hamidreza Omidvar, Andrew Or, Karen Ouyang, Katy M. Pinto, Ethan Porter, Kristin E. Porter, Crystal Qian, Tamkinat Rauf, Anahit Sargsyan, Thomas Schaffner, Landon Schnabel, Bryan Schonfeld, Ben Sender, Jonathan D. Tang, Emma Tsurkov, Austin van Loon, Onur Varol, Xiafei Wang, Zhi Wang, Julia Wang, Flora Wang, Samantha Weissman, Kirstie Whitaker, Maria K. Wolters, Wei Lee Woon, James Wu, Catherine Wu, Kengran Yang, Jingwen Yin, Bingyu Zhao, Chenyun Zhu, Jeanne Brooks-Gunn, Barbara E. Engelhardt, Moritz Hardt, Dean Knox, Karen Levy, Arvind Narayanan, Brandon M. Stewart, Duncan J. Watts, and Sara McLanahan, “Measuring the predictability of life outcomes with a scientific mass collaboration,” Proceedings of the National Academy of Sciences, 2020, 117 (15), 8398–8403.
  48. Santurkar, Shibani, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto, “Whose Opinions Do Language Models Reflect?,” 2023.
  49. Schelling, Thomas C, “Models of segregation,” The American economic review, 1969, 59 (2), 488–493.
  50.   , “Dynamic models of segregation,” Journal of mathematical sociology, 1971, 1 (2), 143–186.
  51. Scherrer, Nino, Claudia Shi, Amir Feder, and David Blei, “Evaluating the moral beliefs encoded in llms,” Advances in Neural Information Processing Systems, 2024, 36.
  52. Turing, A. M., “On Computable Numbers, with an Application to the Entscheidungsproblem,” Proceedings of the London Mathematical Society, 1937, s2-42 (1), 230–265.
  53. Törnberg, Petter, Diliara Valeeva, Justus Uitermark, and Christopher Bail, “Simulating Social Media Using Large Language Models to Evaluate Alternative News Feed Algorithms,” 2023.
  54. Wager, Stefan and Susan Athey, “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests,” Journal of the American Statistical Association, 2018, 113 (523), 1228–1242.
  55. Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc Le, and Denny Zhou, “Chain of Thought Prompting Elicits Reasoning in Large Language Models,” CoRR, 2022, abs/2201.11903.
  56. Wright, Sewall, “The method of path coefficients,” The annals of mathematical statistics, 1934, 5 (3), 161–215.
  57. Zheng, Stephan, Alexander Trott, Sunil Srinivasa, David C Parkes, and Richard Socher, “The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning,” Science advances, 2022, 8 (18), eabk2607.
Citations (16)

Summary

  • The paper introduces a framework using language models and structural causal models to automate social science research, enabling hypothesis generation and testing through simulations.
  • The methodology involves using LLMs to generate hypotheses, construct simulated agents, design experiments, execute simulations, and analyze data to fit structural causal models.
  • This automated approach has the potential to accelerate social science discovery and improve reproducibility by providing a scalable and easily replicable experimental framework.

Automated Social Science: LLMs as Scientist and Subjects

The paper by Benjamin S. Manning, Kehang Zhu, and John J. Horton explores an innovative approach to leveraging the potential of LLMs in automating social science research. By integrating these models with structural causal models (SCMs), the authors offer a framework that not only generates hypotheses but also tests them through simulated experiments. This automation is significant for advancing our ability to conduct rapid, scalable research in social sciences.

The central premise relies on using structural causal models as a backbone to organize and automate the process of hypothesis generation and testing. SCMs offer a mathematically precise way to define causal relationships, thus facilitating the construction of experimental designs that can be efficiently simulated with LLM-based agents.

Methodology

The methodological innovation of this research lies in structuring the simulations as a sequence of steps that mirror traditional social science research:

  1. Hypothesis Generation: The system uses LLMs to generate potential causes and outcomes within a given domain, effectively building an SCM for the hypothesis.
  2. Agent Construction: Agents are designed with relevant attributes based on the SCM, enabling them to simulate realistic roles in social scenarios.
  3. Experimental Design: The attributes of these agents are varied systematically to mimic the effect of different treatments in traditional experiments.
  4. Simulation Execution: The simulated interactions are governed by pre-defined protocols for conversational turn-taking, which are intelligently selected based on the scenario.
  5. Data Collection and Analysis: Post-experimental surveys measure the outcomes, and the data gathered are used to fit the SCM, allowing for rigorous analysis of the causal paths.

The authors demonstrate this approach through several scenarios, such as bargaining, bail hearings, job interviews, and auctions, thereby highlighting the versatility and robustness of their method.

Insights and Results

The research presents compelling results from the experiments. For instance, the auction scenario validated theoretical predictions well, pointing to the potential of LLM simulations to capture human-like decision-making processes. The approach also revealed that while LLMs could predict the direction of effects, they struggled with estimating effect magnitudes without fitted SCMs. This gap accentuates the importance of structured experimentation to harness the latent knowledge within LLMs effectively.

In terms of the pathways to causal inference, the paper underscores the need for precise causal models, as reliance on observational data or unrestricted simulations can lead to misidentification of causal effects. The SCM-based framework mitigates this by enforcing clarity and structure in experimental design.

Implications and Future Directions

The implications of this research are manifold. Practically, it points to a future where automated systems could conduct hypothesis generation and testing at scale, greatly accelerating the pace of discovery in social sciences. Theoretically, it raises questions about the extent to which LLM-driven simulations can replace traditional human-based experiments, especially in capturing complex social behaviors.

The system's flexibility to incorporate human intervention at any stage of the process ensures that it can serve as a powerful tool for exploratory research while maintaining rigor. The ability to export and replicate these simulations with ease addresses ongoing challenges of reproducibility in social science research.

Future directions could entail optimizing the robustness of agent construction and interaction protocols, as well as integrating more complex causal frameworks within the SCMs. Furthermore, enhancing the system’s capacity to identify novel causal variables could lead to richer and more nuanced insights from automated experiments.

Overall, this paper represents a significant contribution to the field of computational social science, providing a sophisticated framework for automated hypothesis testing with profound potential to transform how social science research is conducted.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 43 tweets and received 3161 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com