Papers
Topics
Authors
Recent
2000 character limit reached

AgentPack: Collaborative Code-Editing Corpus

Updated 30 December 2025
  • AgentPack is a large-scale, real-world corpus of 1,337,012 co-authored code edits, integrating contributions from agents like Claude Code, OpenAI Codex, and Cursor Agent with human oversight.
  • It employs a multi-stage pipeline—comprising event detection, merge filtering, patch extraction, and metadata integration—to ensure high-quality, naturally labeled code-editing examples.
  • Fine-tuning experiments with AgentPack demonstrate notable performance gains in LLM-based code editors, emphasizing the value of multi-file edits and complex, agent-generated rationales.

AgentPack is a large-scale corpus comprising 1,337,012 real-world code edits co-authored by software-engineering agents—@@@@2@@@@ (Anthropic), OpenAI Codex, and Cursor Agent—and humans in public GitHub repositories between April 1 and August 15, 2025. The primary objective of AgentPack is to provide high-quality, naturally labeled code-editing exemplars suitable for fine-tuning and analyzing LLMs on code-editing tasks. The dataset uniquely focuses on human–agent collaborative activity filtered via maintainer approval, distinguished by semantically scoped patches with detailed, agent-generated rationales and natural language intentions (Zi et al., 26 Sep 2025).

1. Source Agents and Dataset Scope

AgentPack identifies code-editing events through agent-specific commit and pull request signatures:

  • Claude Code: Commits contain the annotation Co-Authored-By: Claude <[email protected]>.
  • OpenAI Codex: Pull request descriptions include links to chatgpt.com/codex/tasks.
  • Cursor Agent: Commits bear the author line Cursor Agent <[email protected]>.

The corpus spans 59 GB, drawn from material accepted into default branches ("main" or "master"). The time window commences one week after the Claude Code launch (2025-04-01) and ends on 2025-08-15. This interval captures the initial widespread adoption of these agents in open-source workflows.

2. Identification, Curation, and Quality Control Pipeline

AgentPack employs a multi-stage pipeline for data identification, curation, and quality assurance:

  1. Event Detection: Public GitHub Archive (GH Archive) events are queried for push and pull_request activity, matching agent-specific signatures.
  2. Repository Cloning and Merge Filtering: For each repository containing candidate agent–human co-authored edits, a shallow bare clone is performed, retaining only those commits merged into the default branch, thus leveraging the project maintainers' review process as an implicit human-in-the-loop quality control mechanism.
  3. Patch Extraction and Noise Reduction: The complete git diff (all hunks) is extracted per commit, with patches referencing files under node_modules/ removed to avoid contaminating the corpus with third-party vendor code.
  4. Metadata Integration: Commit metadata—including timestamp, agent label, and original (agent-generated) description—are joined with patch content to form individual AgentPack items.

An illustrative scoring function for prioritizing high-signal commits is defined as

s(commit)=αfscope(commit)+βfclarity(message)s(\text{commit}) = \alpha\,f_{\mathrm{scope}}(\text{commit}) + \beta\,f_{\mathrm{clarity}}(\text{message})

where fscopef_{\mathrm{scope}} favors edits affecting fewer files, and fclarityf_{\mathrm{clarity}} rewards message length, with α,β>0\alpha, \beta > 0 as tunable weights.

AgentPack documents rapid adoption of LLM-based agents in software-engineering practice:

  • Aggregate Commits Over April–August 2025:
    • Claude Code: ~854,946 commits
    • Codex: ~372,006 commits
    • Cursor Agent: ~110,060 commits

Cumulative commit rate functions Ca(t)C_a(t) track agent-specific uptake, and the adoption ratio is given by

Ra(t)=Ca(t)Ctotal(t)R_a(t) = \frac{C_a(t)}{C_{\mathrm{total}}(t)}

with Ctotal(t)=aCa(t)C_{\mathrm{total}}(t) = \sum_{a} C_a(t). Claude Code exhibited the fastest growth (steep increase May–June), peaking mid-May (~10,000 commits/week), followed by a plateau. Codex peaked in June, whereas Cursor Agent usage increased steadily but at a lower rate, suggesting more targeted deployment scenarios.

4. Structural Properties of Edits

AgentPack's code edits display distinct structural characteristics in comparison to historical human-only corpora:

  • Median Per-Edit Statistics:
    • Files touched: 2
    • Patch size: 70 lines (added + removed)
    • Hunks per file: 1.5
    • Commit message length: 323 characters

Comparative benchmarks include CommitPackFT (single-file edits: patch size 4 lines, message length 43 chars) and CanItEdit (patch size 7 lines, message length 57 chars); thus, AgentPack edits are approximately 10× larger, with commit messages 6–10× longer.

Code-edit complexity is quantified as

Complexity(patch)=files_count×log(1+lines_changed)\mathrm{Complexity}(\mathrm{patch}) = \mathrm{files\_count} \times \log(1 + \mathrm{lines\_changed})

AgentPack's complexity distribution is bimodal at approximately 30 and 120, corresponding to a spectrum from rapid single-file fixes to multi-file refactorings.

Dataset Median Patch Size Median Message Length
AgentPack 70 lines 323 chars
CommitPackFT 4 lines 43 chars
CanItEdit 7 lines 57 chars

5. Fine-Tuning Experiments and Benchmarking

A subset of 118,848 Python-only edits (≤4096 tokens, ~120M tokens total) was used to fine-tune DeepSeekCoder 1.3B and 6.7B models from the EditCoder family. The prompt template was:

1
2
3
4
5
6
7
## Instruction:
{message}

## Code Before:
{old}

## Code After:

Training parameters included the AdamW optimizer, learning rate 2e-5, batch size 64, 3 epochs, and cosine decay with 10% warmup. Benchmark evaluations involved HumanEvalFix (bug-fixing) and CanItEdit tasks, using pass@1 (20 samples, temperature=0.2, top-p=0.95).

Results:

Model Base HEF Base CI EditCoder HEF EditCoder CI AgentPack HEF AgentPack CI Δ_HEF Δ_CI
DeepSeekCoder-1.3B 0.19 0.11 0.20 0.29 0.32 0.32 0.13 0.21
DeepSeekCoder-6.7B 0.39 0.30 0.45 0.42 0.48 0.41 0.09 0.11

AgentPack-driven fine-tuning produced significant gains relative to both baseline and existing EditCoder models (p<0.05p < 0.05, paired bootstrapping). Ablations demonstrate that removing multi-file edits reduced ΔHEF\Delta_\mathrm{HEF} by ≈ 0.04, underscoring the value of complex, multi-file examples.

6. Distinctive Advantages and Limitations

Advantages:

  • Rich natural language intent articulation: commit messages in AgentPack are approximately 10× longer than those in human-only datasets.
  • Broader operational scope: includes multi-file changes, test additions, refactorings, and documentation edits.
  • Human-vetted commit quality: inclusion is restricted to changes merged to mainline branches, leveraging existing project governance processes.

Limitations:

  • Absence of original user prompts; only agent-generated rationales are present.
  • Uncertainty regarding Cursor Agent's backend model identity due to lack of disclosure.
  • Possible human modification of some agent-authored commits post-merge.
  • Exclusively public-project data; excludes unmerged or private edits, introducing a potential repository bias.

7. Implications and Future Directions

AgentPack establishes a paradigm for constructing high-quality code-editing datasets grounded in real-world, human-vetted agent–human collaboration. A plausible implication is that the combination of human-in-the-loop quality control, enriched linguistic context, and increased structural scope has direct performance benefits for downstream LLM-based code-editors.

Future avenues include extending coverage to additional programming languages and low-resource ecosystems, reconstructing original prompt–response dialogue contexts, and deploying AgentPack as an RL training environment for software-engineering agents in open-world settings. AgentPack serves as both a benchmark and resource for systematically studying agent integration into software development workflows and informs the next generation of code-editing system development (Zi et al., 26 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to AgentPack.