Exploring Prompt Engineering Practices in the Enterprise

Published 13 Mar 2024 in cs.HC and cs.AI | (2403.08950v1)

Abstract: Interaction with LLMs is primarily carried out via prompting. A prompt is a natural language instruction designed to elicit certain behaviour or output from a model. In theory, natural language prompts enable non-experts to interact with and leverage LLMs. However, for complex tasks and tasks with specific requirements, prompt design is not trivial. Creating effective prompts requires skill and knowledge, as well as significant iteration in order to determine model behavior, and guide the model to accomplish a particular goal. We hypothesize that the way in which users iterate on their prompts can provide insight into how they think prompting and models work, as well as the kinds of support needed for more efficient prompt engineering. To better understand prompt engineering practices, we analyzed sessions of prompt editing behavior, categorizing the parts of prompts users iterated on and the types of changes they made. We discuss design implications and future directions based on these prompt engineering practices.

Abstract PDF HTML Upgrade to Chat

References (29)

Citations (6)

View on Semantic Scholar

Summary

The paper examines prompt editing behaviors in enterprise environments, revealing iterative refinements of context and task instructions as dominant practices.
The paper analyzes quantitative metrics from 57 sessions, highlighting frequent parameter changes and high similarity ratios that underscore incremental edits.
The paper discusses implications for tool design, advocating enhanced version control, structured debugging, and standardized templates to boost efficiency.

Detailed Analysis of Enterprise Prompt Engineering Practices

Introduction and Context

The paper "Exploring Prompt Engineering Practices in the Enterprise" (2403.08950) systematically examines how practitioners in enterprise settings edit and refine prompts for LLMs across a variety of use cases. Rather than focusing on optimization strategies or technical frameworks, the study provides granular insight into prompt engineering behaviors, including the types of edits users make, their frequency, and the components of prompts most subject to modification.

Enterprise LLM use cases are diverse, comprising tasks such as code/SQL generation, content summarization, classification, extraction, and content-grounded Q&A. The operational requirements often demand outputs with high specificity and accuracy, and involve interaction with models varying in capability, cost, and specialization. The paper builds upon prior taxonomies and qualitative studies, expanding the understanding of prompt engineering to real-world enterprise contexts with dedicated tools for prompt experimentation.

Quantitative Analysis of Prompt Sessions

The analysis draws from 57 prompt editing sessions sampled from a broader dataset, capturing a total of 1523 individual prompt edits. Sessions tend to be lengthy, with a mean duration of 43.4 minutes per session and a median duration of 39 minutes, reflecting substantial iterative efforts.

Figure 1: Distribution of prompt editing session durations, highlighting the commonly extended session lengths.

Prompt similarity ratios between successive edits predominantly range from 0.7 to 1.0, illustrating that most changes are incremental such as refinements or tweaks, rather than wholesale revisions. Occasionally, larger edits occur, potentially due to parallel prompting workflows or significant task shifts.

Figure 2: Sequence similarity ratios between successive prompts, with values near 1 indicating incremental edits.

Parameter changes are widespread; 93% of sessions include at least one change to model parameters, most frequently shifting between target models, altering max token settings, or adjusting repetition penalties. Users demonstrated a mean of 3.6 different models per session, suggesting a comparative, exploratory approach to model selection.

Figure 3: Frequency of inference parameter changes across sessions, with model selection, token limits, and repetition penalties as predominant targets.

Figure 4: Distribution of the number of models used per session, supporting observations of multi-model comparison.

Prompt Component Editing Behavior

The qualitative coding reveals that practitioners focus primarily on two components: context and task instructions. Context encompasses embedded examples, grounding documents, and input queries, and is both the most frequently edited and a key lever for influencing model behavior. Task instructions, describing the goal or output specifics, are edited less often than context but remain central to the iterative process.

Figure 5: Frequency of edits across prompt components, with context dominating and task instructions following.

Edits are predominantly modifications (maintaining the same meaning with refined phrasing), followed by additions, changes (altering meaning), removals, and formatting. The pairing of edit types with prompt components further clarifies prevailing patterns: context addition, instruction task modification, and label modification are among the most common.

Figure 6: Top prompt component and edit type combinations observed, illustrating prevalent editing behaviors.

Edits are not exclusively sequential; 22% of edits involve multiple simultaneous changes, most typically including context. Nearly half of multi-edits combine context and instruction edits, while 11% involve context and label edits. The frequency of such multi-edits and parameter adjustments underscores the complexity and inefficiency inherent in the current iterative paradigm.

Rollbacks, where practitioners undo or redo edits, account for 11% of prompt edits. Rollbacks are notably frequent for components less commonly edited such as handle-unknown instructions (40% rollback rate), output-length (25%), and persona (18%), suggesting uncertainty about their impact or adverse effects.

Patterns in Context, Instructions, and Labels

Context editing emerges as dominant due to both dialog simulation and example manipulation. The interface’s design, appending generated output directly to prompt input, facilitates iterative elaboration or removal of conversational turns and grounding data.

Editing of instructions, especially task modifications, aligns with trial-and-error practices. Variants include rewording between different formulations (commands, questions, descriptions) and adjusting detail or structure. Surprisingly, edits to secondary instruction components (output format, inclusion rules, persona, handle-unknown, output length) are relatively infrequent—potentially due to task domain conventions or standardized requirements.

Label editing is notably common. Labels (identifiers, tags) delineate structure within prompts (instructions, context, examples, output) and serve as constructs for more precise control. Users frequently modify output labels to attempt to constrain model generation behavior.

Implications for Prompt Engineering Tools and Practices

Observed behaviors—frequent rollbacks, multi-component edits, parameter switching, and high reliance on context—highlight cognitive challenges and inefficiencies in prompt iteration. Existing tooling, including visual prompt engineering environments and GUI-based frameworks, partially address these gaps but lack robust version control and edit impact tracking tailored to prompt engineering’s requirements.

The paper suggests that systematic support for prompt debugging and testing could improve productivity, for example via enhanced edit histories, structured prompting frameworks, or semi-automated variation authoring. Standardizing structural components (labels, formatting) may yield further gains, especially in enterprise applications requiring consistent, document-grounded or multi-turn prompts.

These findings call for both tooling innovation (richer interface support, systematic experiment tracking, composable prompt frameworks) and further research into how prompt structure, context management, and edit impact feedback can be optimized for enterprise prompt engineering.

Theoretical and Practical Implications, Future Directions

From a theoretical perspective, the study elucidates user mental models, revealing a heavy reliance on iterative context manipulation and task refinement as a means to control LLM behavior. The prevalence of parameter changes and rollback behaviors signals gaps in user understanding and tool-mediated feedback. The results reinforce the need for prompt engineering to be conceptualized as both a linguistic and a procedural discipline.

Practically, the insights can inform the design of LLM-driven enterprise workflows, including automated prompt optimization, modular prompt construction, editable prompt collections, and tools for comparative model evaluation. Future work may explore:

Prompt quality evaluation metrics based on use case taxonomy
Semi-automated variation exploration with impact visualization
Standardized template libraries for enterprise use cases
Enhanced support for context management and output constraint specification

Such developments may close the iteration-to-adoption loop and accelerate robust application of LLMs in knowledge-intensive domains.

Conclusion

This paper provides a granular analysis of prompt editing in enterprise settings, identifying context and task instruction as principal loci for iterative modification, with label editing and parameter changes also prevalent. Incremental edits dominate, but inefficiencies persist due to cognitive overload and lack of systematic tool support. The findings inform both the immediate development of prompt engineering assistance and the longer-term quest for standardization and automation in LLM-powered enterprise workflows.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Exploring Prompt Engineering Practices in the Enterprise

Summary

Detailed Analysis of Enterprise Prompt Engineering Practices

Introduction and Context

Quantitative Analysis of Prompt Sessions

Prompt Component Editing Behavior

Patterns in Context, Instructions, and Labels

Implications for Prompt Engineering Tools and Practices

Theoretical and Practical Implications, Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (2)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Exploring Prompt Engineering Practices in the Enterprise

Summary

Detailed Analysis of Enterprise Prompt Engineering Practices

Introduction and Context

Quantitative Analysis of Prompt Sessions

Prompt Component Editing Behavior

Patterns in Context, Instructions, and Labels

Implications for Prompt Engineering Tools and Practices

Theoretical and Practical Implications, Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (2)

Collections

Tweets

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research