Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From Prompt Engineering to Prompt Science With Human in the Loop (2401.04122v3)

Published 1 Jan 2024 in cs.HC and cs.AI

Abstract: As LLMs make their way into many aspects of our lives, one place that warrants increased scrutiny with LLM usage is scientific research. Using LLMs for generating or analyzing data for research purposes is gaining popularity. But when such application is marred with ad-hoc decisions and engineering solutions, we need to be concerned about how it may affect that research, its findings, or any future works based on that research. We need a more scientific approach to using LLMs in our research. While there are several active efforts to support more systematic construction of prompts, they are often focused more on achieving desirable outcomes rather than producing replicable and generalizable knowledge with sufficient transparency, objectivity, or rigor. This article presents a new methodology inspired by codebook construction through qualitative methods to address that. Using humans in the loop and a multi-phase verification processes, this methodology lays a foundation for more systematic, objective, and trustworthy way of applying LLMs for analyzing data. Specifically, we show how a set of researchers can work through a rigorous process of labeling, deliberating, and documenting to remove subjectivity and bring transparency and replicability to prompt generation process. A set of experiments are presented to show how this methodology can be put in practice.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Using LLM (Large Language Model) to Improve Efficiency in Literature Review for Undergraduate Research. (2023).
  2. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 610–623.
  3. LLM-assisted content analysis: Using large language models to support deductive coding. arXiv preprint arXiv:2306.14924 (2023).
  4. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence 5, 3 (2023), 220–235.
  5. Carlos Gómez-Rodríguez and Paul Williams. 2023. A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing. arXiv preprint arXiv:2310.08433 (2023).
  6. LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models. arXiv preprint arXiv:2304.01933 (2023).
  7. Klaus Krippendorff. 2011. Computing Krippendorff’s alpha-reliability. (2011).
  8. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958 (2021).
  9. Nora Freya Lindemann. 2023. Sealed Knowledges: A Critical Approach to the Usage of LLMs as Search Engines. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. 985–986.
  10. InsightPilot: An LLM-Empowered Automated Data Exploration System. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 346–352.
  11. Extracting information seeking intentions for web search sessions. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 841–844.
  12. Auditing large language models: a three-layered approach. AI and Ethics (2023), 1–31.
  13. Asking Questions about Scientific Articles—Identifying Large N Studies with LLMs. Electronics 12, 19 (2023), 3996.
  14. Andrew Patel and Jason Sattler. 2023. Creatively malicious prompt engineering. WithSecure Intelligence (2023).
  15. Generating Efficient Training Data via LLM-based Attribute Manipulation. arXiv preprint arXiv:2307.07099 (2023).
  16. Supporting human-ai collaboration in auditing llms with llms. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society. 913–926.
  17. A survey of hallucination in large foundation models. arXiv preprint arXiv:2309.05922 (2023).
  18. Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.
  19. Carrying out qualitative analysis. Qualitative research practice: A guide for social science students and researchers 2003 (2003), 219–62.
  20. Neil Savage. 2023. Drug discovery companies are customizing ChatGPT: here’s how. Nature Biotechnology (2023).
  21. Chirag Shah and Emily M Bender. 2022. Situating search. In Proceedings of the 2022 Conference on Human Information Interaction and Retrieval. 221–232.
  22. Chirag Shah and Emily M Bender. 2023. Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web? Preprint https://chiragshah.org/papers/Envisioning_IAS_preprint.pdf (2023).
  23. Using large language models to generate, validate, and apply user intent taxonomies. arXiv preprint arXiv:2309.13063 (2023).
  24. Creation and adoption of large language models in medicine. Jama 330, 9 (2023), 866–869.
  25. Shaping the Emerging Norms of Using Large Language Models in Social Computing Research. In Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing. 569–571.
  26. GPT-4 Doesn’t Know It’s Wrong: An Analysis of Iterative Prompting for Reasoning Problems. arXiv preprint arXiv:2310.12397 (2023).
  27. Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE transactions on visualization and computer graphics 29, 1 (2022), 1146–1156.
  28. Large Language Models Can Accurately Predict Searcher Preferences. (September 2023). https://www.microsoft.com/en-us/research/publication/large-language-models-can-accurately-predict-searcher-preferences/
  29. Jean-Philippe Vert. 2023. How will generative AI disrupt data science in drug discovery? Nature Biotechnology (2023), 1–2.
  30. Matthijs J Warrens. 2015. Five ways to look at Cohen’s kappa. Journal of Psychology & Psychotherapy 5 (2015).
  31. Can large language models transform computational social science? Computational Linguistics (2023), 1–53.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Chirag Shah (41 papers)
Citations (6)