Programming by Chat: A Large-Scale Behavioral Analysis of 11,579 Real-World AI-Assisted IDE Sessions

Published 1 Apr 2026 in cs.SE and cs.HC | (2604.00436v1)

Abstract: IDE-integrated AI coding assistants, which operate conversationally within developers' working codebases with access to project context and multi-file editing, are rapidly reshaping software development. However, empirical investigation of this shift remains limited: existing studies largely rely on small-scale, controlled settings or analyze general-purpose chatbots rather than codebase-aware IDE workflows. We present, to the best of our knowledge, the first large-scale study of real-world conversational programming in IDE-native settings, analyzing 74,998 developer messages from 11,579 chat sessions across 1,300 repositories and 899 developers using Cursor and GitHub Copilot. These chats were committed to public repositories as part of routine development, capturing in-the-wild behavior. Our findings reveal three shifts in how programming work is organized: conversational programming operates as progressive specification, with developers iteratively refining outputs rather than specifying complete tasks upfront; developers redistribute cognitive work to AI, delegating diagnosis, comprehension, and validation rather than engaging with code and outputs directly; and developers actively manage the collaboration, externalizing plans into persistent artifacts, and negotiating AI autonomy through context injection and behavioral constraints. These results provide foundational empirical insights into AI-assisted development and offer implications for the design of future programming environments.

Abstract PDF Upgrade to Chat

Authors (9)

Summary

The paper demonstrates that developers use iterative modifications and corrective turns (32.05%) to drive progressive specification in AI-assisted coding.
It employs a novel 7-category taxonomy and clustering (K-Medoids with k=6) to uncover distinct session archetypes in real-world IDE use.
Developers offload cognitive tasks through explicit context engineering, shifting from code writing to managing AI-generated outputs.

Large-Scale Empirical Analysis of Conversational Programming in IDE-Native Environments

Introduction

This paper conducts a rigorous, large-scale behavioral analysis of real-world, IDE-integrated AI coding assistant usage, comprising 74,998 developer messages from 11,579 chat sessions across 1,300 repositories and 899 users employing Cursor and GitHub Copilot (2604.00436). The dataset, collected via SpecStory, uniquely captures in-the-wild conversational programming committed as part of authentic development activity. The analysis directly addresses significant gaps in the literature, which historically suffers from limited ecological validity, artificial task assignments, or focuses on browser-based chatbots without codebase context.

Methodology

A mixed-methods approach grounds the analysis. Message-level behavioral intents were exhaustively categorized using a novel 7-category, 20-subcategory taxonomy iteratively developed with abductive coding and validated LLM-based classifiers (macro-F1 = 0.802). On the session scale, sequences of behavioral intents enabled hierarchy-aware edit distance-based clustering (K-Medoids, $k = 6$ ) to extract session archetypes from sustained multi-turn interactions. Further analyses of temporal dynamics, transition patterns, and session-boundary behavior corroborate the findings at multiple timescales.

Key Findings

Progressive Specification and Interaction Modalities

Developers predominantly engage in progressive specification rather than monolithic, upfront task description. Iterative modification and corrective turns (32.05%) massively outnumber initial implementation requests (5.86%). Early turns focus on setting global objectives and injecting context, whereas subsequent turns shift toward reactive troubleshooting, refinement, or continuation signals—often densely contextualized and minimal in natural language. This reflects a division of labor wherein developers actively steer and course-correct AI outputs as emergent requirements and unexpected behaviors surface.

Figure 1: Illustrative IDE-native conversational session annotated with the behavioral intent taxonomy, demonstrating the dynamics of the Failure-Driven Debugging archetype.

Cognitive Work Redistribution

There is a conspicuous redistribution of traditional developer effort. Instead of manually diagnosing errors, developers frequently report symptoms and paste machine/NLP output, transferring causal reasoning to the AI. Critical tasks—error diagnosis, validation (both static and runtime), comprehension—are systematically offloaded, with developers reserving explicit intervention mainly for high-level intent clarification. This aligns with a shift from direct code engagement to collaboration management.

Figure 2: Subcategory co-occurrence patterns expose structured dependencies—e.g., alignment correction’s high co-labeling with iterative modification and context specification—reflecting the layered collaboration.

Collaboration Management and Context Engineering

Developers actively manage AI autonomy by externalizing plans (persistent documents, Markdown files) and supplying behavioral constraints or context updates. The findings indicate a deliberate negotiation of assistant agency: context injection, persona specification, and session restarts are used to control task boundaries, tool use, and conversational state. This reveals the emergence of explicit context engineering as a developer competency.

Session Archetypes and Interaction Structure

Cluster analysis exposes six archetypal session categories, ranging from Failure-Driven Debugging to Extended Iterative Co-Development:

Planning/Comprehension: Dialogues centered on architectural decision-making and exploration.
Failure-Driven Debugging: Sessions dominated by error reporting and iterative attempts to resolve system failures.
Focused Iterative Refinement: Sustained code modifications without significant shifts into debugging.
Continuation-Driven Delegation: Upfront task specification followed by minimal oversight, characterized by repeated 'continue' directives.
Extended Iterative Co-Development: Exceptionally long, mixed-intent sessions, reflecting durable, high-bandwidth co-working processes.
Toolchain-Oriented Operations: Tasks delegated revolve around infrastructural or operational actions rather than application-level code development.
Figure 3: Session length CDF showing the dominance of short, task-focused sessions, but with a significant long tail for iterative co-development.

Figure 4: t-SNE projection visualizing the distribution and structure of session archetypes, with clearly discernable behavioral modes.

Figure 5: Per-archetype behavioral intent distributions highlight the distinct dominant modes for each cluster.

Session Dynamics

A pronounced self-reinforcing tendency is observed: behavioral acts such as iterative modification and log pasting persist in runs, manifesting as micro-cycles of debugging or correction interspersed with validation and delegation. Session boundaries typically reset the conversational state but maintain high-level task continuity, emulating context window refresh motifs observed in practical IDE use. The opening turns significantly differ in length and intent composition from later turns, underlining their role in initial task framing.

Figure 6: Markov lift analysis reveals transitions most likely to persist or recur, indicating stable behavioral pathways such as iterative debugging and validation cycles.

Implications

Benchmarks and Model Development

The centrality of progressive specification exposes a limitation of dominant evaluation paradigms (e.g., SWE-bench, HumanEval), which assume complete, static specifications. Architecting next-generation AI coding assistants and benchmarks must account for emergent requirements, contextual ambiguity, and distributed dialogic workflows.

Workflow and Tooling

The evidence for cognitive redistribution and explicit context engineering motivates HCI and SE research into tooling that supports intent tracking, context summarization, plan externalization, and AI action auditing. As developer/AI collaboration intensifies, workflow management may eclipse code writing as the locus of developer value-add, affecting both IDE design and developer education.

Risks and Future Research

When comprehension and validation are delegated to the same assistant, error propagation and silent misalignment become acute risks—assistance without independent verification creates systemic single points of failure. Understanding and mitigating these failure modes is critical, particularly as IDE-native assistants evolve toward increased automation and agency.

Conclusion

This work establishes a robust empirical foundation for the study of conversational programming in IDE-native settings. The findings reframe the developer–AI relationship: collaborative, multi-turn interaction, distributed cognition, and explicit context management supplant direct code writing as the dominant workflows. The taxonomy and behavioral archetypes developed here provide a framework for future studies as AI coding agents achieve broader capabilities and longer task horizons.

Markdown Report Issue