Prompts Are Programs Too! Understanding How Developers Build Software Containing Prompts (2409.12447v2)

Published 19 Sep 2024 in cs.SE, cs.AI, and cs.HC

Abstract: Generative pre-trained models power intelligent software features used by millions of users controlled by developer-written natural language prompts. Despite the impact of prompt-powered software, little is known about its development process and its relationship to programming. In this work, we argue that some prompts are programs and that the development of prompts is a distinct phenomenon in programming known as "prompt programming". We develop an understanding of prompt programming using Straussian grounded theory through interviews with 20 developers engaged in prompt development across a variety of contexts, models, domains, and prompt structures. We contribute 15 observations to form a preliminary understanding of current prompt programming practices. For example, rather than building mental models of code, prompt programmers develop mental models of the foundation model (FM)'s behavior on the prompt by interacting with the FM. While prior research shows that experts have well-formed mental models, we find that prompt programmers who have developed dozens of prompts still struggle to develop reliable mental models. Our observations show that prompt programming differs from traditional software development, motivating the creation of prompt programming tools and providing implications for software engineering stakeholders.

Citations (3)

View on Semantic Scholar

Summary

The paper demonstrates that developers treat prompts as programs by iteratively crafting and testing them, revealing reliability challenges similar to traditional code.
The study uncovers unique developer mental models and the fragile nature of prompts, which require both qualitative and quantitative evaluation methods.
The research emphasizes the need for innovative debugging tools and educational reforms to support the rapid, unsystematic development practices in prompt programming.

Understanding Prompt Programming in Software Development

The increasing predominance of generative pre-trained models (FMs), such as GPT-4 and their application across various tasks, has led to the emergence of a phenomenon referred to as prompt engineering. This paper postulates the notion that developers use prompts operationally akin to programs to leverage FMs in software product development, coining this practice as "prompt programming". Through a qualitative paper using Straussian grounded theory methodology, the authors conduct interviews with 20 developers to explore how prompts function similarly to traditional programming and to the understand the developers' process in building prompt-powered software.

Prompt Programming as a New Software Development Paradigm

The authors propose that prompt programming is distinguished from conventional programming by several unique characteristics. In this context, prompts are not merely simple instructions but represent a programmatic entity that encapsulates specific behavior patterns required from FMs to accomplish specified tasks. The paper offers critical insights into the intrinsic nature of prompt programming:

Developer Mental Models: Developers form mental models of FM behavior through iterative interactions, yet these models are consistently found to be unreliable. This unreliability raises challenges in predicting the FM's responses accurately despite repeated prompt iterations and tests.
Model and Prompt Characteristics: FMs exhibit varying qualities and competencies, which become apparent through interaction. Prompts are sensitive to nuances and often fragile, which means that they may break or perform unexpectedly due to minimal changes in the prompt wording or due to changes in the underlying model.
Development Process: The development of prompts is highly iterative, rapid, and often unsystematic, contrasting with the more structured methods traditional programming employs. Additionally, development is compounded by difficulties in debugging and confirming fault localization within prompts, which lacks the systematic tools customarily utilized in software debugging.
Diverse Testing and Evaluation: Prompt programs necessitate multifaceted evaluation strategies encompassing both qualitative assessment and quantitative metrics testing across varying scopes. The need for dataset curation becomes critical, emphasizing the importance of high-quality and task-representative data streams.

Implications and Future Directions

The analysis indicates that prompt programming diverges considerably from traditional programming paradigms, necessitating new approaches and tool support to address the unique challenges it presents. The findings provide practical implications:

Tool Development: There is a clear need for novel development tools facilitating prompt management, debugging, evaluation, and collaboration in the context of prompt chains. These tools should define mechanisms to manage rapid iteration effectively while maintaining version control and tracking change histories.
Educational Insights: Educational programs must evolve to incorporate prompt programming, fostering essential skills that merge programming intuition with empirical evaluation to harness FMs effectively. Teaching strategies could incorporate experiential learning methodologies to expedite prompt intuition development.
Cross-discipline Collaboration: The prompt programming field would benefit from interdisciplinary collaboration, integrating insights from software engineering, machine learning, and human-computer interaction communities to enhance tool support for prompt programming activities.

The investigation elucidates prompt programming as a distinct subfield of software development, emphasizing its interdisciplinary nature and distinct challenges. Future research should continue to explore refined methodologies and tools to further embed prompt programming into mainstream software engineering practice. The paper significantly contributes to understanding how integrating FMs can shape new programming methodologies, potentially redefining how developers engage with models at the intersection of natural language processing and software engineering.