HAFixAgent: History-Aware Program Repair

Updated 5 March 2026

HAFixAgent is a history-aware, agent-based system for automated program repair that uses version-control data to significantly improve bug fixes and refactoring outcomes.
It employs blame-derived heuristics such as fn_names, fn_snapshots, and file_diff to extract detailed code evolution information, enhancing multi-hunk and complex bug repairs.
The agent architecture integrates context building, iterative execution, and sandboxed testing to optimize repair accuracy while maintaining computational and cost efficiency.

HAFixAgent is a history-aware, agent-based system for automated program repair (APR) and code maintenance. It leverages version-control history—specifically, commit blame information and associated heuristics—to augment the reasoning and editing capabilities of LLM–driven agents. HAFixAgent has demonstrated substantial improvements in software repair and refactoring tasks by systematically incorporating historical context, showing effectiveness over non-history LLM and agentic baselines, especially on complex, multi-hunk bugs and for Haskell code refactoring. The following sections detail its design principles, heuristics, agentic mechanisms, quantitative performance, and practical considerations (Shi et al., 15 Jan 2025, Siddeeq et al., 24 Jun 2025, Shi et al., 2 Nov 2025).

1. Motivation and Design Principles

Traditional LLM- and agent-based repair systems focus primarily on the local snapshot of buggy code, often ignoring the rich repository history available via version-control systems. However, empirical studies demonstrate that most real-world bugs are concentrated in lines recently touched by a small number of commits—frequently, a single "blame" commit captures the bug-introducing change (Shi et al., 2 Nov 2025). By mining this commit, systems gain access to granular developer intent, co-evolution patterns, and the rationale for code evolution, which can significantly enrich the bug-fixing and refactoring process. This principle underpins HAFixAgent, which operationalizes history-aware context injection to achieve repair rates up to 3× higher than state-of-the-art non-history agentic baselines, particularly on multi-location and multi-file defects (Shi et al., 2 Nov 2025).

2. Blame-Derived Repository Heuristics

HAFixAgent formalizes a set of blame- and history-derived heuristics to extract actionable context from version-control systems. These heuristics are applied to the commit(s) identified via git blame over buggy lines:

fn_names: Collects all function signatures from files modified in the blame commit, capturing co-evolving functions and cross-hunk dependencies (Shi et al., 2 Nov 2025).
fn_snapshots: Annotates before-and-after code bodies for functions surrounding buggy lines, directly illustrating logical evolution across the critical commit (Shi et al., 2 Nov 2025).
file_diff: Provides the unified diff patch for the blame commit, yielding the minimal line-by-line delta associated with the most likely bug-introducing change (Shi et al., 2 Nov 2025, Shi et al., 15 Jan 2025).
Blameless fallback: For add-only or unblameable changes, looks up to five lines above insertions for the nearest executable line to determine history grounding (Shi et al., 2 Nov 2025).
Additional heuristics—in some instantiations (notably HAFix), up to seven variants aggregate function name sets by file, code pairs before/after commit, and raw diffs for greater coverage (Shi et al., 15 Jan 2025).

These heuristics provide orthogonal signals, with empirical Venn analysis indicating that each fixes unique subsets of bugs (Shi et al., 2 Nov 2025, Shi et al., 15 Jan 2025).

3. Agent Architecture and Repair Execution

HAFixAgent comprises several coordinated modules, each fulfilling a distinct role in the agentic repair process:

Context Builder: Assembles the metadata context (bug report, failing tests, and fault localization), applies blame heuristics, and merges metadata with historical snippets into structured prompts (Shi et al., 2 Nov 2025).
Agent Execution Loop: Operates as an LLM-driven decision agent issuing discrete actions (typically bash commands or code edits). Each step executes actions in a sandbox, analyzes resulting test outputs, and iterates until all tests pass or a step/cost cap is reached (Shi et al., 2 Nov 2025).
Tools Module: Provides sandboxed shell utilities (e.g., grep, sed), build/test runners (e.g., defects4j), and interfaces for feedback collection (Shi et al., 2 Nov 2025).

When applied to refactoring tasks in functional programming, a multi-agent pipeline integrates components for code context extraction (AST and call graph), code smell detection (e.g., cyclomatic complexity, branching depth), strategy planning (mapping smells to refactoring patterns), refactoring execution, verification, and iterative debugging. Each agent communicates via structured payloads containing source code, context, and patch/test artifacts, with tightly specified role and output contracts (Siddeeq et al., 24 Jun 2025).

procedure HAFixAgent(bug b)
  ctx ← build_metadata_context(b)
  c*  ← history_extractor(b)
  hist ← extract_history(c*, heuristic)
  prompt ← assemble_prompt(ctx, hist)
  transcript ← [prompt]
  for step in 1…MaxSteps do
    action ← LLM(transcript)
    feedback ← execute_in_sandbox(action)
    transcript.append(feedback)
    if all_tests_pass(feedback) then
      return extract_patch(action, transcript)
  end for
  return failure
end procedure

4. Key Algorithms, Prompt Strategies, and Metrics

HAFixAgent’s formal algorithms involve decomposing context by heuristic, constructing tailored prompts, and aggregating candidate fixes:

Prompt Assembly: Sub-prompts per heuristic combine with a strong instruction-style template (e.g., “Please fix line X. Output only the fixed code.”), with empirical evidence that instruction-style prompts outperform masking or inline-labeled variants (Shi et al., 15 Jan 2025).
Aggregation Strategy: The agent runs the LLM on each sub-prompt and unionizes all $n$ -shot outputs; aggregate pass@k is then computed as $\mathrm{Pass@}k=1-\frac{\binom{N-C}{k}}{\binom{N}{k}}$ with $C =$ correct fixes, $N =$ total candidates (Shi et al., 15 Jan 2025).
Refactoring Workflow: For Haskell, the pipeline is algorithmically specified—context analysis, smell detection, mapping to 42 refactoring patterns (categorized by cleanup, readability, performance), patch generation, testing, debugging, and summarization (Siddeeq et al., 24 Jun 2025).
Metrics:
- Repair accuracy: pass@1 or pass@k, typically using test-suite validation (Shi et al., 2 Nov 2025, Shi et al., 15 Jan 2025).
- Complexity/quality: cyclomatic complexity ( $CC$ ), code metrics ( $\Delta CC$ , $\Delta Q_{HLint}$ , etc.), as well as HLint warnings and GHC static checks (Siddeeq et al., 24 Jun 2025).
- Performance: runtime and memory usage ( $\Delta P$ , $\Delta M$ ) via profiling (Siddeeq et al., 24 Jun 2025).
- Efficiency: median agent steps, inference cost per bug (Shi et al., 2 Nov 2025).

5. Quantitative Results and Empirical Analysis

HAFixAgent substantially outperforms non-history and agentic baselines:

Defects4J (Java): On 829 bugs, the file_diff context fixes 523 (vs. baseline RepairAgent’s 164), yielding a relative improvement of +218.9%. Similar patterns observed across single-line, single-hunk, multi-hunk, and multi-file categories, with especially pronounced gains in difficult multi-hunk cases (file_diff: 175 vs. BIRCH-feedback: 133; +31.6%) (Shi et al., 2 Nov 2025).
Heuristic Contributions: The three history heuristics collectively add 194 unique fixes versus non-history configurations (32 unique); each heuristic targets different bug strata (Shi et al., 2 Nov 2025).
Efficiency: Adding history context does not significantly increase repair step count or median inference cost. Successful repairs typically converge within 12–32 steps depending on defect class. Median cost ranges from ~$0.005 to$0.029 per bug, with diminishing marginal cost over multiple heuristics (Shi et al., 2 Nov 2025).
Python (HAFix): On 51 single-line bugs, baseline (instruction) fixes 20 (39.22%), FLN-all fixes 22 (43.14%; +10%), and full HAFix-Agg (all 7 heuristics) fixes 29 (56.86%; +45%). EarlyStop strategies recover most gains at half the cost and time (Shi et al., 15 Jan 2025).
Haskell Refactoring: Across ten open-source projects, HAFixAgent reduces cyclomatic complexity by 11.03%, improves HLint suggestions by 22.46%, increases runtime efficiency by 13.27%, and reduces memory allocation by 14.57% on average (Siddeeq et al., 24 Jun 2025).

Metric	HAFixAgent (avg.)	Best Baseline	Relative Gain
Repair (Defects4J, file_diff)	523 / 829	164	+218.9%
Repair (Multi-hunk)	175 / 371	133	+31.6%
Single-line fix (Python)	56.86% (H-Agg)	39.22%	+45%
Haskell: $\Delta CC$	11.03%	–	–
Haskell: $\Delta Q_{HLint}$	22.46%	–	–
Haskell: $\Delta P$	13.27%	–	–
Haskell: $\Delta M$	14.57%	–	–

6. Practical Recommendations and Trade-Offs

Empirical studies provide strong guidance for large-scale or industrial deployment:

Leverage version-control history: Always ground repair context in the bug-introducing (blame) commit, using unified diff as the primary heuristic (Shi et al., 2 Nov 2025).
Combine heuristics for coverage: Running two or three context heuristics in union captures ~90% of maximal bug-fix rates at half the cost of running all configurations (Shi et al., 2 Nov 2025, Shi et al., 15 Jan 2025).
Optimize for efficiency: Cap agent steps (e.g., 50) or budget ($1 USD) to prevent unbounded runs, using EarlyStop strategies where practical (Shi et al., 2 Nov 2025, Shi et al., 15 Jan 2025).
Haskell-specific best practices: Align the refactoring pattern catalog with project HLint configuration, integrate into CI via draft pull requests, and modularize agent runs for large codebases. The system is stateless by default, but performance tracking could inform future reinforcement learning enhancements (Siddeeq et al., 24 Jun 2025).
Generality of design: While initially validated on Java (Defects4J), Python (BugsInPy), and Haskell, the architecture and heuristics are extensible to other languages and software engineering tasks (e.g., test synthesis, review) (Shi et al., 2 Nov 2025, Siddeeq et al., 24 Jun 2025).

7. Limitations and Future Directions

Key limitations include:

Assumption of perfect fault localization; degradations may occur in realistic settings with less reliable bug localization (Shi et al., 2 Nov 2025).
Mono-language and mono-benchmark evaluation; history concentration and tool performance may shift in other repository types (Shi et al., 2 Nov 2025).
Evaluation by test-passing only; patch maintainability and deeper semantic correctness are not addressed (Shi et al., 2 Nov 2025, Siddeeq et al., 24 Jun 2025).
Stateless learning; current systems do not leverage prior run statistics for reinforcement or meta-learning (Siddeeq et al., 24 Jun 2025).

Potential research avenues include richer history retrieval (multi-commit, refactor windows), more nuanced search for "blameless" bugs, automated patch quality assessment, and the development of multi-agent systems that explicitly separate history-analysis, synthesis, and review roles (Shi et al., 2 Nov 2025, Siddeeq et al., 24 Jun 2025). Expansion to broader software engineering domains represents a promising trajectory for the HAFixAgent paradigm.

References:

"HAFixAgent: History-Aware Automated Program Repair Agent" (Shi et al., 2 Nov 2025)
"HAFix: History-Augmented LLMs for Bug Fixing" (Shi et al., 15 Jan 2025)
"LLM-based Multi-Agent System for Intelligent Refactoring of Haskell Code" (Siddeeq et al., 24 Jun 2025)

Markdown Report Issue Upgrade to Chat

References (3)

HAFix: History-Augmented Large Language Models for Bug Fixing (2025)

LLM-based Multi-Agent System for Intelligent Refactoring of Haskell Code (2025)

HAFixAgent: History-Aware Automated Program Repair Agent (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HAFixAgent.

HAFixAgent: History-Aware Program Repair

1. Motivation and Design Principles

2. Blame-Derived Repository Heuristics

3. Agent Architecture and Repair Execution

Repair Loop—Summary Pseudocode (Shi et al., 2 Nov 2025):

4. Key Algorithms, Prompt Strategies, and Metrics

5. Quantitative Results and Empirical Analysis

6. Practical Recommendations and Trade-Offs

7. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

HAFixAgent: History-Aware Program Repair

1. Motivation and Design Principles

2. Blame-Derived Repository Heuristics

3. Agent Architecture and Repair Execution

Repair Loop—Summary Pseudocode (Shi et al., 2 Nov 2025):

4. Key Algorithms, Prompt Strategies, and Metrics

5. Quantitative Results and Empirical Analysis

6. Practical Recommendations and Trade-Offs

7. Limitations and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research