Using LLMs in Software Design: An Empirical Study of GitHub and A Practitioner Survey

Published 2 May 2026 in cs.SE and cs.AI | (2605.01392v1)

Abstract: Recent advancements in LLMs have demonstrated significant potential across a wide range of software engineering tasks, including software design, an area traditionally regarded as highly dependent on human expertise and judgment. However, there has been little research focusing on how LLMs are used in software design, nor on the associated benefits and drawbacks. This paper aims to bridge this gap by empirically investigating how software developers utilize LLMs in the context of software design. We conduct a mixed-methods study, combining a mining study of 291 developer-ChatGPT conversations shared on GitHub with a survey of 65 software practitioners. Our findings reveal nine distinct categories of design tasks supported by ChatGPT, including architecture design, data model design, and the use of design patterns. We further characterize developer-ChatGPT interactions, showing that developers primarily use ChatGPT for knowledge acquisition and design-related code generation, with most tasks situated at the detailed design level. The study identifies seven key benefits of utilizing LLMs in software design as perceived by developers, such as better technology selection and the early detection of design flaws. We also uncover six limitations, including the generation of overly lengthy and difficult-to-read outputs, the creation of inexecutable or incorrect code, and a heavy reliance on context that can lead to hallucinated results. These findings provide an evidence-based characterization of current LLM use in software design from both open-source and practitioner perspectives, highlighting a tension between perceived benefits and limitations, which lays a foundation for future research and the development of effective techniques and tools to integrate LLMs into software design practices.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper empirically characterizes LLM usage in software design by mining 291 GitHub dialogues and surveying 65 practitioners to reveal task typology and iterative design interactions.
The paper employs a mixed-methods approach with open coding and constant comparison to classify nine design task categories and quantify iterative dialogue rounds.
The paper highlights practical implications for tool development and further research, emphasizing challenges like lengthy outputs, hallucinations, and context sensitivity.

Empirical Characterization of LLM Usage in Software Design

Research Motivation and Methodology

The paper "Using LLMs in Software Design: An Empirical Study of GitHub and A Practitioner Survey" (2605.01392) addresses the notable gap in the empirical understanding of how LLMs, particularly models like ChatGPT, are used in software design—a phase of the SDLC that has remained underexplored compared to code-centric tasks. Previous systematic reviews indicate that less than 1% of empirical studies focus explicitly on software design, despite the long-term implications of architectural decisions and their susceptibility to erosion and technical debt.

This study employs a mixed-methods approach, combining a mining study of 291 developer-ChatGPT conversations shared on GitHub with a survey of 65 practitioners to delineate real-world usage patterns, task typology, perceived benefits, and encountered limitations. The methodology is anchored in open coding and constant comparison techniques to analyze qualitative data and triangulate with practitioner insights.

Figure 1: Workflow for integrating mining study and practitioner survey into the research process.

Software Design Task Typology Supported by LLMs

Through qualitative analysis of the 291 ChatGPT conversation records, nine primary task categories emerge, collectively supported by LLMs:

Interface and Protocol Design (19.4%): API specification, protocol implementation, and interface optimization dominate.
Architecture Design (13.8%): Module decomposition, architectural trade-off analysis, and component orchestration.
Data Model Design (11.4%): Entity-relationship definition, database schema generation, and data mapping strategies.
Code Refactoring (10.4%): Structural, logical, and performance-driven refactoring leveraging design patterns.
Component Dependency Optimization (5.1%): Resolving circular dependencies, decoupling, and dependency injection.
Performance Optimization (8.2%): System performance tuning via caching, parallelism, and workload segregation.
Security Design (9.0%): Threat modeling, authentication, and cryptographic mechanism selection.
Use of Design Patterns (16.5%): Applying concrete design patterns for best-practices compliance.
User Interface Design (6.1%): Front-end-related layout, interactive components, and UX optimization.

Survey results further calibrate these categories, with practitioners typically reporting engagement across 2–3 distinct design task domains per project cycle.

Figure 2: Distributional counts of the nine software design task categories mined from GitHub and validated by survey.

Characterization of Developer–LLM Interaction

Three core behavioral dimensions characterize LLM-driven design collaboration:

Iterative Dialogues: Developer-LLM interaction averages 6.18 dialogue rounds per task, with non-trivial cases extending to 68 rounds, underscoring that design engagement is seldom one-shot and often requires iterative prompt refinement and context specification.
Prompt Intent: Initiating prompts primarily fall under knowledge query (32.2%), code generation (30.3%), solution recommendation (23.7%), and design verification (13.8%). This distribution reveals strong demand for design exploration, concept clarification, and candidate artifact generation.
Design Level Targeting: The majority of tasks are situated at the detailed design level (54%), notably at the granularity of classes, methods, and design patterns, followed by architectural-level concerns (30%) and lower-level code idioms (16%).

These dimensions are validated through both the mining study and practitioner survey, with statistically significant differences only in dialogue round counts, suggestive of variance between observed and self-reported experience.

Figure 3: Welcome page of the survey questionnaire—instrument used to capture practitioner views on LLM usage.

Perceived Benefits of LLMs in Software Design

Practitioners report seven significant benefits:

Reduced overhead for early-stage design—facilitating increased focus on coding.
Accelerated project onboarding—clarification of architecture and rationale.
More effective retrieval of design information—serving as a superior search engine.
Inspiration of innovative ideas—previously overlooked alternatives.
Early detection of design flaws—catching inconsistencies proactively.
Support for technology selection—navigating unfamiliar stacks.
Concept interpretation—standardizing terminology for collaborative alignment.

Numerical results highlight search efficiency (38%) and early detection of flaws (34%) as top benefits. The diversity of perceived advantages underscores LLMs' role as exploratory, advisory tools rather than fully autonomous design agents.

Reported Limitations and Friction Points

Six major limitations are reported:

Excessively lengthy, difficult-to-read outputs (49%), which increase cognitive load and reduce usability.
Inexecutable or incorrect code generation (43%), necessitating manual validation or rework.
Hallucinated responses (18%), introducing design content irrelevant to requirements context.
Ambiguity from unclear requirement articulation (15%), affecting prompt efficacy.
Limited artifact upload (17%), impeding holistic project analysis.
Strong contextual reliance (12%), undermining robustness in handling complex design scenarios.

The evidence demonstrates persistent friction arising from alignment, validation, and integration challenges inherent to LLM-supported design workflows.

Practical and Theoretical Implications

The results motivate reconsideration of LLM roles in software design. Practitioners derive maximal utility from LLMs when tasks are exploratory, well-bounded, or focused on rapid prototyping and knowledge navigation. However, LLM-generated artifacts remain subject to rigorous human oversight, with responsibility for feasibility, correctness, and contextual fit placed firmly on developers.

For researchers, these findings clarify that evaluations of LLM-based design tools need to move beyond artifact plausibility and incorporate metrics for requirements coverage, decision traceability, and verification cost. Longitudinal, in-situ studies should interrogate how prompt specificity, artifact integration, and iterative design interactions impact downstream design quality and project maintainability.

For tool developers, there is an opportunity to bridge conversational LLM output with actionable, structured software artifacts, such as API sketches, ADRs, or refactoring plans, with improved support for project context ingestion, usability, and traceability.

Future Trajectories in AI-Assisted Software Design

Future directions include:

Replications of these findings across alternate LLM architectures and industrial settings.
Controlled studies quantifying improvement (or degradation) in design quality and labor cost using LLM-generated versus human-generated artifacts.
Enhanced tool support for artifact ingestion, repository structure preservation, and output transformation to machine- and human-readable formats.

It is expected that advances in retrieval augmentation, context management, artifact linking, and long-context LLMs will further mitigate present limitations and expand LLM utility in complex, collaborative design environments.

Conclusion

This empirical study provides high-resolution evidence of LLM usage patterns, perceived benefits, and limitations in software design. LLMs offer significant exploratory, advisory, and accelerative value, predominantly at the detailed design level, but their output remains bounded by context sensitivity, verification cost, and integration friction. The observed interplay between perceived efficiency gains and practical limitations signals a transition from purely code-centric automation to more nuanced, human-in-the-loop design support paradigms. Future research and tool development should target robust context integration, artifact-driven workflows, and long-term design quality assessment to realize the full potential of AI-assisted software design.

Markdown Report Issue