AI-Assisted End-User Coding

Updated 12 December 2025

AI-assisted end-user coding is the practice of using generative AI models to translate natural language into executable code, offering greater flexibility than traditional low-code/no-code platforms.
It employs detailed interaction frameworks like the CUPS taxonomy and human-in-the-loop systems to guide code generation, verification, and debugging in a structured workflow.
Empirical findings indicate that while these systems reduce development friction and can achieve success rates above 70%, they also introduce challenges such as increased verification burdens and prompt complexity.

AI-assisted end-user coding refers to the practice of enabling individuals—especially non-experts in formal programming—to construct software artifacts, scripts, or workflows primarily via interaction with generative AI models, often using natural language, visual modalities, or lightweight code prompts. Contemporary techniques leverage LLMs as code generators, recommenders, or dialogue agents, fundamentally altering traditional paradigms of end-user programming (EUP) and creating a distinct hybrid between coding, prompt engineering, and GUI-based automation.

1. Theoretical Foundations and Distinctions

AI-assisted end-user coding is conceptually delineated from traditional low-code/no-code (LCNC) environments. LCNC platforms provide component-based, drag-and-drop authoring with limited code exposure, yielding rapid prototyping at the cost of flexibility and an increased risk of vendor lock-in. In contrast, AI-assisted paradigms empower users to issue fine-grained, natural-language or semi-structured prompts to LLMs, generating artifacts such as HTML, JavaScript, Python, and integrations with cloud services, often outside proprietary runtimes (Weber, 5 Dec 2025).

A qualitative framework distinguishes:

LCNC: Assembles pre-built widgets, relying on vendor hosting and compliance guardrails.
AI-assisted coding: Offers free-form customization, increased reusability (plain code artifacts), direct code access, and reduced lock-in.

While formal mathematical modeling of flexibility, reusability, or lock-in is absent in current empirical studies, qualitative analysis suggests flexibility scales with the expressive space of code, and reusability grows as generated artifacts are modular and platform-agnostic (Weber, 5 Dec 2025).

2. Interaction Models and Taxonomies

A central contribution to understanding AI-assisted coding workflows is the CUPS taxonomy, which annotates user–AI interactions by four high-level categories and twelve leaf states (Mozannar et al., 2022):

Top-level	Leaf States (examples)
Coding	Writing New Code, Editing Last Suggestion, Debugging/Testing Code
Understanding	Verifying Suggestion, Deferring Thought, Waiting for Suggestion, High-Level Planning
Prompting	Prompt Crafting
Scrolling	Browsing Suggestions

This taxonomy enables detailed telemetry-driven analysis of programmer behavior in environments with AI code recommendation. Retrospective labeling of thousands of session segments shows that 51.5% of developer time is spent in Copilot-specific states, with 22.4% dedicated solely to cognitive verification of AI suggestions and frequent deferral of verification—often leading to post-acceptance edits and an underestimated “verification workload” if only pre-accept intervals are measured. Segment-based time accounting and transition Markov chains provide new, workflow-centric metrics, such as normalized time per state ( $P_s$ ), state transition probabilities ( $P_{s\to s'}$ ), and entropy rates highlighting workflow predictability (observed $H\approx2.24$ bits vs. uniform 3.58 bits) (Mozannar et al., 2022).

3. Empirical Patterns and Usability Findings

Multiple empirical studies converge on a nuanced picture of AI-assisted programming:

LLM-assisted coding is neither direct search-and-reuse, pure compilation, nor a straightforward extension of extant autocomplete. It is marked by probabilistic, non-deterministic generations, variable granularity (from single lines to full modules), and opacity of code provenance (Sarkar et al., 2022).
Adoption patterns among students and non-experts reveal clear resource bifurcation: AI-based assistants (autocomplete) are preferred for code generation; conversational chatbots are preferred for debugging and problem decomposition. Novices tend to avoid direct human help and rely heavily on AI, whereas experts use memory and manual resources more frequently (Echeverry et al., 6 Aug 2025).
In spreadsheet environments, non-programmers using LLM-based formula synthesis systems encounter challenges in decomposing high-level intents to computable steps, validating AI-generated formulas, and maintaining artifacts. Users often prefer direct manipulation over reusable code for ad-hoc tasks and display a “dilemma of the direct answer” (Sarkar et al., 2022).
Large-scale case studies demonstrate feasibility for technically inclined non-programmers to build complete web survey apps via chat-based LLM interaction, with a 73% overall success rate reported among students building HTML/CSS/JS/Apps Script integrations. Success correlates with prompt iteration, access to compatible backends, and strategic use of AI explanations. Perceived effort aligns with outcomes, though integration and debugging (especially authentication/configuration) are primary friction points (Weber, 5 Dec 2025).

4. Interaction Frameworks and System Architectures

Emerging platforms exemplify model-driven and agent-oriented architectures to enhance end-user accessibility:

AIAP: A multi-agent, no-code workflow builder which transforms ambiguous natural-language user queries into structured, executable visual workflows (An et al., 4 Aug 2025). Key elements include query processing, modular decomposition into labeled Data/Action/Context entities, action retrieval via semantic embedding similarity, and iterative plan refinement with human-in-the-loop oversight. The front end offers real-time AI-generated suggestions, node-based drag-and-drop workflow composition, and dynamic API linking, all without exposing raw code.
HiLDe: A human-in-the-loop decoding system for code completion which exposes “critical” token-level decisions to the user for direct intervention, guiding the LLM’s output towards user-aligned or domain-compliant solutions. Mathematical re-weighting of token probabilities by an importance score induces uncertainty highlights and invites selective user review at high-entropy positions. Empirical results show a 31% reduction in security vulnerabilities on coding tasks and increased intentional code alignment, with only a moderate increase in latency and no significant uptick in cognitive load (González et al., 28 May 2025).

5. Limitations, Challenges, and Design Implications

Several challenges are recurring across empirical and conceptual analyses:

Verification Burden: User studies show that cognitive costs of verifying AI-generated code constitute a significant, often underreported fraction of session time—exceeding 22% in realistic sessions (Mozannar et al., 2022). Deferred verification is prevalent and correlates with higher post-accept edit rates, indicating a nontrivial “verification debt.”
Prompt Complexity and Expressiveness: Users frequently encounter a gulf between their mental model and the “language” of prompts or intermediate code expected by the model (Sarkar et al., 2022). The lack of transparency into provenance and the stochasticity of model outputs complicate integration and debugging.
Control and Agency: There is concern that over-reliance on AI generation leads users to disengage from critical decision-making (“turn off their brains”), risking propagation of subtle errors. Agency-preserving designs—such as human-in-the-loop step selection, progressive disclosure, and actionable explanations—are posited as remedies (González et al., 28 May 2025).
Enduring Roles for Code: Although generative AI broadens EUP’s task space, arguments remain for the fundamental relevance of formal code for control, trust, explainability, and debugging—alongside emergent “notational expertise” for prompt engineering (Sarkar, 2023).

6. Design Guidelines, Evaluation Metrics, and Future Directions

Effective AI-assisted end-user coding systems must incorporate:

Metrics Beyond Acceleration: State-aware measurement (e.g., CUPS analysis) surfaces which stages—verification, prompting, waiting—are genuine human effort bottlenecks, suggesting optimizations beyond raw completion speed (Mozannar et al., 2022).
Hybrid Interfaces: Interfaces that fuse textual prompts, direct code, visual workflow assembly, and side-by-side display of AI-generated artifacts enable progressive adoption and variable granularity of control (Weber, 5 Dec 2025, An et al., 4 Aug 2025).
Integration of Testing and Validation: Systems are urged to support continuous validation—unit tests, type checks, coverage tracking, and reusability metrics—at every point of interaction. Best practices include design-staging files, external context management, test-driven prompting, and critical code review, as outlined in “Ten Simple Rules for AI-Assisted Coding in Science” (Bridgeford et al., 25 Oct 2025).
Pedagogical and Organizational Guidance: Embedding AI prompt-literacy in curricula, dedicated support channels, and compliance guardrails are recurrent recommendations for deployment at scale (Weber, 5 Dec 2025).
Research Trajectories: There is a need for longitudinal studies, formal modeling of key system affordances (especially flexibility, reusability, lock-in), robust automated prompt-analysis, and broadened empirical coverage to include less technically inclined user populations (Weber, 5 Dec 2025, Sarkar, 2023).

7. Conceptual Shifts and Research Implications

The “generative shift hypothesis” posits a qualitative and quantitative expansion of EUP driven by generative AI (Sarkar, 2023). Foundational theoretical models (Blackwell’s attention investment, Ko’s learning barriers) must be revised to accommodate collapsed automation costs, attention shifted to prompt design and review, and the emergence of new “notational” learning curves for effective AI collaboration. Future interfaces are expected to support hybrid, mixed-modality workflows in which informality orchestrates automation but formality guarantees trust and correctness.

Recommended future research includes systematic analysis of prompt-to-artifact pipelines, hybrid composition (natural language plus code), provenance tracking, context-window management, and new efficacy benchmarks—such as pass@k for functional correctness or suggestion survival rates for end-user artifact persistence (Sarkar et al., 2022, Mozannar et al., 2022, Sarkar, 2023).

AI-assisted end-user coding thus marks both a paradigm shift and a reconceptualization of core technical and human-centered issues in software creation, with open research challenges spanning usability, agency, methodological rigor, and formal system design.