Building Software by Rolling the Dice: A Qualitative Study of Vibe Coding

Published 27 Dec 2025 in cs.SE and cs.HC | (2512.22418v2)

Abstract: LLMs are reshaping software engineering by enabling "vibe coding," in which developers build software primarily through prompts rather than writing code. Although widely publicized as a productivity breakthrough, little is known about how practitioners actually define and engage in these practices. To shed light on this emerging phenomenon, we conducted a grounded theory study of 20 vibe-coding videos, including 7 live-streamed coding sessions (about 16 hours, 254 prompts) and 13 opinion videos (about 5 hours), supported by additional analysis of activity durations and prompt intents. Our findings reveal a spectrum of behaviors: some vibe coders rely almost entirely on AI without inspecting code, while others examine and adapt generated outputs. Across approaches, all must contend with the stochastic nature of generation, with debugging and refinement often described as "rolling the dice." Further, divergent mental models, shaped by vibe coders' expertise and reliance on AI, influence prompting strategies, evaluation practices, and levels of trust. These findings open new directions for research on the future of software engineering and point to practical opportunities for tool design and education.

Abstract PDF Upgrade to Chat

Summary

The paper investigates vibe coding by analyzing 20 videos using grounded theory, revealing a spectrum from high-AI-reliance to hybrid human-AI workflows.
The paper finds that stochastic LLM outputs result in design fixation and prompt redundancy, challenging traditional debugging and code review practices.
The paper highlights the need for adaptive tooling, improved explainability, and enhanced pedagogical approaches to support effective AI-assisted software development.

Building Software by Rolling the Dice: A Qualitative Study of Vibe Coding

Introduction and Methodology

"Building Software by Rolling the Dice: A Qualitative Study of Vibe Coding" (2512.22418) provides an extensive qualitative investigation into "vibe coding," a mode of software development characterized by the use of natural language prompts to direct LLMs for code generation and system construction. Rather than programming through direct code manipulation and iterative text editing, vibe coders operate primarily at the prompt level, with varying degrees of code inspection and intervention. The authors employ Straussian Grounded Theory on a heterogeneous corpus of 20 publicly available videos—including 7 live-coded sessions and 13 post-hoc opinion videos—supplemented with mix-method analyses on coder time allocation and prompting intent. This approach enables the authors to systematically examine practice diversity, tool affordances, prompt strategies, and the impact of coder background and mental models.

Figure 1: Grounded Theory process adopted for investigating vibe coding, from literature sensitization and data collection to qualitative and mixed-method analysis.

Conceptual Framework of Vibe Coding

Vibe Coding Practice Spectrum: The study reveals a behavioral continuum across practitioners. At one extreme, high-AI-reliance coders ("YOLO mode") treat code as an opaque artifact and delegate all inspection, debugging, and evaluation to models and UI outputs. At the other extreme, technically proficient practitioners use LLMs for macro-level edits or initial scoping, followed by substantial code review, targeted prompting, and manual integration. The stochastic nature of LLM output permeates both ends, with the debugging process frequently feeling probabilistic—hence the "rolling the dice" analogy.

Figure 2: A detailed model of how vibe coders interact with tools, models, and artifacts, and how their mental models are updated via prompting and evaluation cycles.

Mental Models and Reliance: Vibe coders' expertise and trust in AI influence their construction of prompts, their evaluation standards, and their tooling workflows. Notably, mental models are not static: they evolve in response to tool feedback, prior experience, and failures, but remain susceptible to inaccuracy due to tool opacity (e.g., hidden context or system prompts).

Toolchains and Wrappers: Practitioners commonly orchestrate multi-tool workflows, leveraging both online platforms (VZero, Replit, Bolt) for prototyping or vernacular tasks and IDE-integrated agents (Cursor, Windsurf, Claude Code) for refinement and control. Model switching and wrapper combination are pragmatic, not dogmatic, driven by cost, speed, response style, or coverage requirements.

Empirical Findings

Figure 3: Empirical breakdown of activity duration across live-streamed vibe coding sessions, illustrating heterogeneity in code inspection, output review, prompting, and waiting time.

Time Allocation and Behavioral Metrics: High-AI-reliance sessions are dominated by prompt submission, review of AI responses, and waiting for generation, with minimal time allocated to code inspection or external resources. Indeed, waiting for model output comprised >20% of session time across all observed livestreams, peaking above 50% in some cases. Technically proficient coders spent more session time navigating, editing, and inspecting code, echoing findings in recent large-scale quantitative studies [ziegler2024measuring].

Rolling-the-Dice Prompting: Method-level prompt redundancy is pronounced among high-reliance practitioners—up to 40% of prompts in select sessions involve reiterating the same request or copying unchanged error messages, hoping for different model outcomes. In contrast, coders familiar with traditional debugging exhibited lower prompt redundancy and inserted new hypotheses or contextual details, reducing repeated stochastic retries.

Debugging as Stochastic Search: Debugging via LLMs becomes a process akin to sampling from an uncertain distribution. Model inconsistency, context drift, and token window limitations mean that even identical prompts can yield divergent outputs, corroborating empirical analyses of LLM non-determinism in code generation [ouyang_empirical_2025].

Design Fixation: A recurring behavioral trap is early anchoring on initial AI outputs, with subsequent work focused primarily on refinement rather than systematic exploration of alternatives. This design fixation, also observed in generative AI-assisted creative tasks [wadinambiarachchi_effects_2024], has implications for technical debt accrual and defect propagation [SOLIMAN2021106669].

Implications and Theoretical Considerations

Practical Tooling Challenges: The stochastic, non-deterministic, and context-sensitive properties of LLM-driven workflows present significant engineering challenges. Coders mitigate unpredictability by constraining changes to small increments, aggressively versioning, and leveraging automated tests and linters where feasible. However, the lack of interface transparency (i.e., hidden system prompts, undisclosed context, non-human-like memory) impedes the development of accurate user mental models and effective trust calibration [wang_investigating_2024].

Prompt Programming and Context Management: As prompts evolve into first-class programmatic artifacts [liang_prompts_2025; beurer-kellner_prompting_2023], the burden of context management, prompt tracking, and intent disambiguation moves from the code to natural language interfaces. Practitioners exploit strategies such as context window resets, explicit file/folder referencing, and parallel session management, but these are stopgap measures in the absence of robust support for prompt engineering and context provenance [wu_ai_2022; wu_promptchainer_2022].

Cognitive Work and Pedagogy: The study affirms that minimal technical knowledge (e.g., understanding of code structure, error analysis, or test design) dramatically enhances prompt construction, evaluation accuracy, and bug detection. Pedagogically, deductive instruction and data science literacy in AI model capabilities remain essential. Without foundational scaffolding, users struggle to ask the right questions or recognize model hallucinations—a finding with serious implications for AI-driven democratization of programming [prather_widening_2024].

Agentic Workflows and Multi-Agent Orchestration: Vibe coding exemplifies the early adoption of multi-agent, agentic workflows for software engineering, wherein specialized LLM-powered agents autonomously divide labor, coordinate edits, and self-evaluate. While promising for scalability and rapid prototyping, cascading stochastic errors, failure recovery, and inter-agent coordination remain open research challenges [heAgenticAIRoadMap2025; hong_metagpt_2023].

Directions for Future Research

Mitigating Design Fixation: Research is needed to scaffold divergent design exploration, potentially via structured visualization of the design space, automated suggestion of alternatives, or prompt chaining systems that surface orthogonal solutions [suh_luminate_2024].
Explainability and Auditing: Improving auditability of LLM outputs and supporting co-audit workflows [gordon_co-audit_2023] are essential for trust calibration and error recovery. Integrating provenance-tracking, behavioral explanations, and scenario replay may reduce negative impacts from stochastic code changes.
Adaptive Tooling: IDEs and AI agents should be further adapted to coders’ skill levels, providing scaffolding and decision support for novices while remaining non-intrusive for experts. Automated context management, prompt provenance, and conflict resolution for parallel agentic workflows are open problems.
Empirical Generalization: Extending analyses beyond English and high-level languages, and evaluating industrial/enterprise-scale maintenance scenarios, is needed to reify the boundaries and sustainability of vibe coding practices.

Conclusion

This qualitative study (2512.22418) provides a detailed conceptual and empirical account of vibe coding as an emergent socio-technical phenomenon. Vibe coders collectively embody a spectrum from high-AI-reliance, prompt-only workflows to hybrid AI-human code review and intervention. Despite productivity gains and increased accessibility, reliance on stochastic LLM output introduces challenges: debugging becomes probabilistic, code and design fixation propagates technical debt, and prompt programming demands new strategies for context and artifact management. These findings have immediate implications for tool design, prompt engineering, and pedagogy, and furnish a framework for future research on agentic software engineering, critical AI literacy, and productive human-AI code co-creation.

Figures

Figure 1: Grounded Theory process adopted for investigating vibe coding, from literature sensitization and data collection to qualitative and mixed-method analysis.

Figure 2: Model of vibe coding workflows demonstrating the relationship between coder experience, AI reliance, prompting, evaluation, and mental model refinement.

Figure 3: Distribution of coder time allocation across activities in live stream sessions, illustrating the prevalence of output review, prompt submission, and waiting in high-AI-reliance workflows.

Reference:

"Building Software by Rolling the Dice: A Qualitative Study of Vibe Coding" (2512.22418)