AgentPack: Collaborative Code-Editing Corpus

Updated 30 December 2025

AgentPack is a large-scale, real-world corpus of 1,337,012 co-authored code edits, integrating contributions from agents like Claude Code, OpenAI Codex, and Cursor Agent with human oversight.
It employs a multi-stage pipeline—comprising event detection, merge filtering, patch extraction, and metadata integration—to ensure high-quality, naturally labeled code-editing examples.
Fine-tuning experiments with AgentPack demonstrate notable performance gains in LLM-based code editors, emphasizing the value of multi-file edits and complex, agent-generated rationales.

AgentPack is a large-scale corpus comprising 1,337,012 real-world code edits co-authored by software-engineering agents—@@@@2@@@@ (Anthropic), OpenAI Codex, and Cursor Agent—and humans in public GitHub repositories between April 1 and August 15, 2025. The primary objective of AgentPack is to provide high-quality, naturally labeled code-editing exemplars suitable for fine-tuning and analyzing LLMs on code-editing tasks. The dataset uniquely focuses on human–agent collaborative activity filtered via maintainer approval, distinguished by semantically scoped patches with detailed, agent-generated rationales and natural language intentions (Zi et al., 26 Sep 2025).

1. Source Agents and Dataset Scope

AgentPack identifies code-editing events through agent-specific commit and pull request signatures:

Claude Code: Commits contain the annotation Co-Authored-By: Claude <[email protected]>.
OpenAI Codex: Pull request descriptions include links to chatgpt.com/codex/tasks.
Cursor Agent: Commits bear the author line Cursor Agent <[email protected]>.

The corpus spans 59 GB, drawn from material accepted into default branches ("main" or "master"). The time window commences one week after the Claude Code launch (2025-04-01) and ends on 2025-08-15. This interval captures the initial widespread adoption of these agents in open-source workflows.

2. Identification, Curation, and Quality Control Pipeline

AgentPack employs a multi-stage pipeline for data identification, curation, and quality assurance:

Event Detection: Public GitHub Archive (GH Archive) events are queried for push and pull_request activity, matching agent-specific signatures.
Repository Cloning and Merge Filtering: For each repository containing candidate agent–human co-authored edits, a shallow bare clone is performed, retaining only those commits merged into the default branch, thus leveraging the project maintainers' review process as an implicit human-in-the-loop quality control mechanism.
Patch Extraction and Noise Reduction: The complete git diff (all hunks) is extracted per commit, with patches referencing files under node_modules/ removed to avoid contaminating the corpus with third-party vendor code.
Metadata Integration: Commit metadata—including timestamp, agent label, and original (agent-generated) description—are joined with patch content to form individual AgentPack items.

An illustrative scoring function for prioritizing high-signal commits is defined as

$s(\text{commit}) = \alpha\,f_{\mathrm{scope}}(\text{commit}) + \beta\,f_{\mathrm{clarity}}(\text{message})$

where $f_{\mathrm{scope}}$ favors edits affecting fewer files, and $f_{\mathrm{clarity}}$ rewards message length, with $\alpha, \beta > 0$ as tunable weights.

3. Adoption Trends and Quantitative Metrics

AgentPack documents rapid adoption of LLM-based agents in software-engineering practice:

Aggregate Commits Over April–August 2025:
- Claude Code: ~854,946 commits
- Codex: ~372,006 commits
- Cursor Agent: ~110,060 commits

Cumulative commit rate functions $C_a(t)$ track agent-specific uptake, and the adoption ratio is given by

$R_a(t) = \frac{C_a(t)}{C_{\mathrm{total}}(t)}$

with $C_{\mathrm{total}}(t) = \sum_{a} C_a(t)$ . Claude Code exhibited the fastest growth (steep increase May–June), peaking mid-May (~10,000 commits/week), followed by a plateau. Codex peaked in June, whereas Cursor Agent usage increased steadily but at a lower rate, suggesting more targeted deployment scenarios.

4. Structural Properties of Edits

AgentPack's code edits display distinct structural characteristics in comparison to historical human-only corpora:

Median Per-Edit Statistics:
- Files touched: 2
- Patch size: 70 lines (added + removed)
- Hunks per file: 1.5
- Commit message length: 323 characters

Comparative benchmarks include CommitPackFT (single-file edits: patch size 4 lines, message length 43 chars) and CanItEdit (patch size 7 lines, message length 57 chars); thus, AgentPack edits are approximately 10× larger, with commit messages 6–10× longer.

Code-edit complexity is quantified as

$\mathrm{Complexity}(\mathrm{patch}) = \mathrm{files\_count} \times \log(1 + \mathrm{lines\_changed})$

AgentPack's complexity distribution is bimodal at approximately 30 and 120, corresponding to a spectrum from rapid single-file fixes to multi-file refactorings.

Dataset	Median Patch Size	Median Message Length
AgentPack	70 lines	323 chars
CommitPackFT	4 lines	43 chars
CanItEdit	7 lines	57 chars

5. Fine-Tuning Experiments and Benchmarking

A subset of 118,848 Python-only edits (≤4096 tokens, ~120M tokens total) was used to fine-tune DeepSeekCoder 1.3B and 6.7B models from the EditCoder family. The prompt template was:

## Instruction:
{message}

## Code Before:
{old}

## Code After:

Training parameters included the AdamW optimizer, learning rate 2e-5, batch size 64, 3 epochs, and cosine decay with 10% warmup. Benchmark evaluations involved HumanEvalFix (bug-fixing) and CanItEdit tasks, using pass@1 (20 samples, temperature=0.2, top-p=0.95).

Results:

Model	Base HEF	Base CI	EditCoder HEF	EditCoder CI	AgentPack HEF	AgentPack CI	Δ_HEF	Δ_CI
DeepSeekCoder-1.3B	0.19	0.11	0.20	0.29	0.32	0.32	0.13	0.21
DeepSeekCoder-6.7B	0.39	0.30	0.45	0.42	0.48	0.41	0.09	0.11

AgentPack-driven fine-tuning produced significant gains relative to both baseline and existing EditCoder models ( $p < 0.05$ , paired bootstrapping). Ablations demonstrate that removing multi-file edits reduced $\Delta_\mathrm{HEF}$ by ≈ 0.04, underscoring the value of complex, multi-file examples.

6. Distinctive Advantages and Limitations

Advantages:

Rich natural language intent articulation: commit messages in AgentPack are approximately 10× longer than those in human-only datasets.
Broader operational scope: includes multi-file changes, test additions, refactorings, and documentation edits.
Human-vetted commit quality: inclusion is restricted to changes merged to mainline branches, leveraging existing project governance processes.

Limitations:

Absence of original user prompts; only agent-generated rationales are present.
Uncertainty regarding Cursor Agent's backend model identity due to lack of disclosure.
Possible human modification of some agent-authored commits post-merge.
Exclusively public-project data; excludes unmerged or private edits, introducing a potential repository bias.

7. Implications and Future Directions

AgentPack establishes a paradigm for constructing high-quality code-editing datasets grounded in real-world, human-vetted agent–human collaboration. A plausible implication is that the combination of human-in-the-loop quality control, enriched linguistic context, and increased structural scope has direct performance benefits for downstream LLM-based code-editors.

Future avenues include extending coverage to additional programming languages and low-resource ecosystems, reconstructing original prompt–response dialogue contexts, and deploying AgentPack as an RL training environment for software-engineering agents in open-world settings. AgentPack serves as both a benchmark and resource for systematically studying agent integration into software development workflows and informs the next generation of code-editing system development (Zi et al., 26 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to AgentPack.