Kimi-Dev: Hybrid Agentless & Agentic SWE Model

Updated 30 September 2025

Kimi-Dev is an open-source software engineering LLM that uses a hybrid methodology combining agentless training with reinforcement learning to generate efficient debugging and patch generation solutions.
It employs a structured two-role workflow, decomposing tasks between BugFixer and TestWriter to isolate atomic skills such as file localization, precise code editing, and self-reflection.
Empirical results on SWE-bench and agentic settings validate its state-of-the-art performance and demonstrate a practical bridge between workflow-driven and interactive coding frameworks.

Kimi-Dev refers to both a specific open-source software engineering LLM and the associated methodologies pioneered for skill induction through agentless training, ultimately enabling efficient and adaptable coding agents. The approach centers on developing transferable skill priors—localization, code editing, and self-reflection—through a highly structured, single-turn workflow and then adapting these priors to multi-turn agentic frameworks. Kimi-Dev demonstrates state-of-the-art performance on the SWE-bench Verified and competitive results in full agentic scenarios, supporting its role as a bridge between workflow-driven and agentic SWE systems.

1. Structured Agentless Training Paradigm

Kimi-Dev’s core innovation is a multi-stage agentless training process focused on the acquisition of atomic skills necessary for automated software debugging and repair. Rather than purely relying on multi-turn agentic protocols, agentless training decomposes the problem into explicit roles within a workflow:

Duo Framework: The workflow consists of two roles:
- BugFixer: Generates a patch that targets the detected bug.
- TestWriter: Produces test cases to reproduce the original reported issue and to validate the patch.

Both entities must execute atomic capabilities such as file localization and precise code edits. These skills are systematically embedded in the training regime.

This approach is designed to enable reasoning-intensive, single-turn steps that directly produce verifiable outputs, in contrast to the typical interactive, multi-turn agent frameworks.

2. Token-Level Skill Prior Induction

The training procedure commences with mid-training and cold-start supervised fine-tuning phases. These are summarized as follows:

Base Model Selection: Kimi-Dev starts from Qwen 2.5-72B-Base, chosen for its robustness and general code reasoning.
Data Curation: Approximately 150B tokens are collected from real-world GitHub issues, pull requests, natural diff patches (50B tokens), curated PR commit packs (20B tokens), and synthetic chain-of-thought (CoT) reasoning traces.
Supervised Fine-Tuning: This activates long-CoT reasoning, facilitating problem analysis, method sketching, and iterative self-refinement.

The methodology ensures that file localization, precise code editing, and self-reflection reasoning chains are represented in both the patch and test generation processes, forming a skill basis (hereafter “skill priors”—Editor's term) for subsequent agentic adaptation.

3. Reinforcement Learning for Atomic Code Edits

Once localization and patch generation capabilities are established, reinforcement learning (RL) is applied to further optimize code editing steps:

Policy Gradient RL: The reward function reflects patch correctness using actual execution feedback. The loss function is a group-wise policy gradient:

$J_{θ} = \mathbb{E}_{q \sim P(Q), \{o_i\}_{i=1}^G} \left\{ \sum_{i=1}^{G} \left[ \left(R_i(q, o_i) - \mathrm{mean}(\{R_j\}_{j=1}^G) - \tau \log \frac{\pi_{θ}(o_i | q)}{\pi_{θ_{ref}}(o_i | q)} \right)^2 \right] \right\}$

where $R_i(q, o_i)$ is the outcome-based reward, and $\pi_{θ_{ref}}$ is the reference policy.

Adaptive Prompt Selection: RL leverages adaptive prompt selection and positive example reinforcement for training stability.

These steps focus the model’s capabilities towards reliably generating code edits that correctly resolve issues when tested in executable environments.

4. Test-Time Self-Play and Autonomous Patch Selection

At inference (test) time, Kimi-Dev generates a diverse pool of candidate patches and test cases, allowing for autonomous patch selection:

Self-Play Protocol: Typically, 40 patch candidates and corresponding tests are drawn.
Evaluation Metric: Each candidate patch $b_i$ receives a score $S_i$ , computed via:

$S_i = \frac{\sum_j FP(i, j)}{\sum_j F(j)} + \frac{\sum_j PP(i, j)}{\sum_j P(j)}$

where $FP(i, j)$ counts transitions from fail to pass post-patch, $PP(i, j)$ counts retained passing statuses, and $F(j)$ / $P(j)$ enumerate total failing/passing test cases.

Patch Selection: The patch with the highest $S_i$ is output, reflecting actual execution improvements rather than heuristic preference.

This protocol operationalizes self-reflection and extended multi-turn reasoning at test time, which serves as the foundation for integrating agentic extension.

5. SWE-Bench Performance and Transferability

Kimi‑Dev reaches a 60.4% success rate on SWE-bench Verified—noted as the highest among workflow (agentless) approaches to date. When further adapted via supervised fine-tuning with 5k SWE-Agent trajectories, Kimi-Dev achieves 48.6% pass@1 in agentic (multi-turn) settings, matching competition-level closed-source models (e.g., Claude 3.5 Sonnet, version 241022).

These results indicate that agentless skill induction establishes a robust coding prior, which can be smoothly transferred to agentic frameworks—a substantial advancement for building adaptable, transferable coding agents.

6. Bridging Workflow and Agentic Frameworks

Kimi-Dev’s pipeline demonstrates that agentless training is not a mutually exclusive alternative to agentic protocols. Instead, the model leverages reasoning-intensive, structured workflows to develop underlying competencies, which are then efficiently reused within flexible multi-turn interaction paradigms.

Skill Priors as Transfer Mechanism: By using workflow-derived skill priors, adaptation to agentic systems is expedited and stabilized.
Hybrid Modality Advantages: Training agentic models fully from scratch is noted to suffer from long-horizon credit assignment and exploration issues. Kimi-Dev’s staged approach mitigates these limitations.

A plausible implication is that future SWE LLMs will increasingly leverage this hybrid training design to combine the efficiency of workflow induction with the flexibility of agentic reasoning.

7. Implications for Automated Coding Agents

Kimi-Dev’s methodology suggests that encoding structured skill priors through large-scale workflow-based supervised and RL training is a promising foundation for constructing versatile, robust coding agents capable of operating in both predefined workflow settings and open-ended interactive scenarios.

Rather than positioning agentless and agentic paradigms as alternatives, Kimi-Dev empirically validates skill transfer across both, supporting the development of next-generation automated debugging, patch generation, and testing systems.

In summary, Kimi-Dev integrates agentless training for skill prior induction, outcome-driven reinforcement learning, and autonomous self-play, culminating in state-of-the-art performance and highly transferable coding agents for SWE-bench and related tasks (Yang et al., 27 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents (2025)

Follow Topic

Get notified by email when new papers are published related to Kimi-Dev.