Papers
Topics
Authors
Recent
Search
2000 character limit reached

Agentic Coding Agents in Software Engineering

Updated 13 April 2026
  • Agentic coding agents are autonomous software engineering collaborators with capabilities for environment interaction, long-horizon planning, and dialogical feedback.
  • They utilize large-scale datasets and structured metrics, such as PR acceptance rate and time-to-merge, to assess performance and impact.
  • Their deployment in software projects enhances feature development while introducing challenges in review processes and heterogeneous adoption.

Agentic coding agents are autonomous software engineering collaborators with the capability to plan, execute, and iterate on complex development tasks in real-world project environments. They distinguish themselves from conventional code-completion tools by demonstrating environment awareness and tool use, long-horizon planning, and dialogical interaction. Recent research has formalized their role, workflow, and impact on software engineering using large-scale datasets and comprehensive empirical studies, establishing a rigorous foundation for the systematic study of agentic software engineering (Li et al., 9 Feb 2026).

1. Definition, Taxonomy, and Identification

Agentic coding agents differ fundamentally from basic code-completion tools through three interdependent properties: (1) environment awareness and tool use—enabling them to interact with the repository context, execute build/test commands, and manipulate multiple files; (2) long-horizon planning and iteration—allowing decomposition of complex issues and submission of incremental changes; and (3) dialogical interaction—permitting iterative engagement in human review loops, including responding to reviewer comments or CI feedback.

Formally, identification of an "Agentic-PR" in version control systems relies on metadata signatures:

AgenticPR(p)={1if author(p)∈A and ∣diff(p)∣>0 0otherwise\mathrm{AgenticPR}(p) = \begin{cases} 1 & \text{if } \mathrm{author}(p) \in A \text{ and } |\mathrm{diff}(p)| > 0 \ 0 & \text{otherwise} \end{cases}

where AA is the set of recognized agent usernames, and ∣diff(p)∣>0|\mathrm{diff}(p)| > 0 ensures the PR reflects a substantive code change.

Agentic coding agents can be grouped architecturally as:

  • Completion-Plus (OpenAI Codex, GitHub Copilot): Wrappers around code-completion models capable of launching PRs,
  • Planning-First (Devin, Cursor, Claude Code): Orchestrators integrating planning, implementation, and test cycles via sequences of LLM calls (Li et al., 9 Feb 2026).

2. Methodologies and Dataset Construction

The AIDev dataset was constructed by programmatically querying the GitHub API for pull requests authored by known agentic logins, capturing comprehensive PR-level, repository-level, and author-level metadata. For repositories with notable community engagement (≥100 stars), additional data such as inline code review comments, verdicts, commit diffs, and issue mappings were collected. All PRs were automatically categorized using a classifier based on the Conventional Commits taxonomy (e.g., feat, fix, test), providing fine-grained labels for downstream analysis.

AIDev Dataset Overview:

Entity Count Description
all_pull_request 932,791 PR metadata
all_repository 116,211 Repository metadata
all_user 72,189 Developer metadata
pull_request (≥100★) 33,596 Curated subset
repository (≥100★) 2,807 Curated subset
pr_comments 39,122 Discussion threads
pr_reviews 28,875 Inline reviews
pr_commits 88,576 Linked commits
pr_commit_details 711,923 File diffs
related_issue 4,923 PR–issue mappings
pr_task_type 33,596 Automated task class.

This methodological approach enables scalable, large-scale, longitudinal analysis of agent-driven contributions, facilitating deeper understanding of trends and impacts (Li et al., 9 Feb 2026).

3. Quantitative Metrics and Modeling

Quantitative assessment of agentic coding agents' impact employs several metrics, the two principal ones being:

  • PR Acceptance Rate for a given agent:

ARagent=∑p∈Pagent1[state(p)=merged]∣Pagent∣\mathrm{AR}_{\mathrm{agent}} = \frac{\sum_{p \in \mathcal{P}_{\mathrm{agent}}} \mathbf{1}[\mathrm{state}(p) = \text{merged}]}{|\mathcal{P}_{\mathrm{agent}}|}

  • Time-to-Merge for pull request pp:

TTRp=tmerge,p−topen,p\mathrm{TTR}_p = t_{\mathrm{merge},p} - t_{\mathrm{open},p}

where Pagent\mathcal{P}_{\mathrm{agent}} is the PR set per agent, and topen,tmerget_{\mathrm{open}}, t_{\mathrm{merge}} are event timestamps.

These metrics support further analyses such as comparing median merge latency by agent and task type, evaluating review intensity (via the count of comments or review events), and correlating review intensity with ultimate PR fate.

Findings highlight that agentic PRs are merged at a rate comparable to human-authored PRs in some contexts, but exhibit increased time-to-merge (1.7× on average), attributable to greater review scrutiny and trust calibration (Li et al., 9 Feb 2026).

4. Empirical Results: Workflow, Task Types, and Collaboration

Application of automated task classification to the dataset reveals that ~45% of agentic PRs target new feature development (feat), ~30% bug repair (fix), ~15% testing (test), and the remainder documentation or chore-related tasks. Functional analysis indicates the following:

  • Feature PRs: Agents provide initial scaffolding (e.g., APIs, boilerplate), but human maintainers frequently modify code for naming accuracy, error-handling, and policy compliance.
  • Debugging PRs: Predominantly minimal one-off patches (5–20 lines); reviewers commonly demand added assertions or negative path tests.
  • Test PRs: Agents usually complement code patches with unit tests, but coverage for edge cases and negative paths is often prompted during review.

The dataset documents frequent dialogical PR refinement, e.g., agents modifying implementations in direct response to inline reviewer comments (substituting exception raising for Boolean returns in a validation function, as illustrated in the paper).

Qualitatively, maintainers develop project-specific "interaction recipes," such as pre-assigning issues to agents and refining their output, demonstrating symbiotic man–machine collaboration and a shift toward "AI teammate" paradigms (Li et al., 9 Feb 2026).

5. Sociotechnical Adoption, Heterogeneity, and Implications

Agentic coding agents are now operational at scale ("routine actors" in SE 3.0), but their adoption profile is highly heterogeneous:

  • Large, mature repositories (≥1k stars) exhibit fewer agentic PRs and more intensive PR reviews,
  • Small and new repositories accept agent-driven contributions with minimal checks,
  • Agentic PRs require 1.7× longer to merge, reflecting skepticism and extra due diligence.

Predictive modeling using repository context features (template usage, previous bot activity) achieves up to 0.75 ROC–AUC for outcome prediction (accept vs. request-changes), indicating measurable, context-sensitive signals drive acceptance decisions.

Practices are emerging whereby developers treat agents as active teammates, not just passive servants: agents are assigned issues, human maintainers review/merge, and then layer refinements, producing intertwined authorship (Li et al., 9 Feb 2026).

6. Current Limitations and Research Directions

Several critical open challenges and dataset limitations are documented:

  • Partial Observability: The methodology only captures public GitHub projects using explicit bot accounts, omitting private, self-hosted, or stealth-integration scenarios.
  • Outcome, Not Correctness: Code quality (correctness, security, performance) is inferred indirectly from review and merge outcomes.
  • Identification Ambiguities: Possible misclassification arises if bot renaming obfuscates authorship, or if multi-agent workflows switch author accounts mid-PR.

Research roadmaps proposed include:

  • Fine-grained, semantic assessment of agent-generated code (e.g., maintainability, vulnerability risk),
  • Longitudinal studies of developer skill evolution as agentic workflows become embedded,
  • Controlled field trials on new agentic governance models (e.g., mandatory CI gates for agent commits),
  • Expansion beyond GitHub, including multi-agent and cross-platform agent orchestration (Li et al., 9 Feb 2026).

The open release of the AIDev dataset enables systematic measurement and informed progress toward robust agentic software engineering, establishing agentic coding agents as first-class participants in contemporary development processes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Agentic Coding Agents.