Papers
Topics
Authors
Recent
Search
2000 character limit reached

Elixir: Effective OO Program Repair

Updated 17 March 2026
  • The paper introduces a novel generate-and-validate approach that expands the repair-expression space using aggressive method invocation synthesis.
  • ELIXIR employs an expressive repair-expression language and a machine-learned ranking model to generate, rank, and validate candidate patches.
  • Experimental evaluation on Defects4J and Bugs.jar demonstrates a significant boost in correct repairs compared to traditional repair tools.

Elixir is a generate-and-validate program repair technique for object-oriented (OO) languages, specifically motivated by the critical role of method invocations (MIs) in OO program structure and bug-fixing. This approach enables the synthesis of program patches that can aggressively incorporate method calls, markedly enlarging the repair-expression space and thereby addressing classes of OO bugs often out of reach for existing techniques. The ELIXIR system uses an expressive repair-expression language and a machine-learned ranking model to effectively generate, rank, and validate candidate patches, yielding significant improvements in the repair of real-world OO software defects (Saha et al., 2021).

1. Motivation: Method Invocations in Object-Oriented Repairs

Encapsulation in OO programming locates most data and operations behind public methods, making MIs such as obj.foo(a, b) the sole avenue for state access or mutation. Empirical analyses of large Java codebases (Eclipse JDT, Platform, BIRT) demonstrate that 57% of executable statements involve at least one MI, a figure substantially higher than the 33% seen in C programs. Moreover, 77% of one-line bug-fixes in such software involve MI changes—30–40% are stand-alone MI modifications, while others are embedded within conditional or assignment fixes.

Existing generate-and-validate repair tools are systematically limited in their ability to synthesize new or overloaded MIs. They typically:

  • Rely on copy-pasting existing code snippets (e.g., jGenProg) and cannot generate novel MI expressions not already in the code,
  • Apply constrained templates that do not synthesize new MIs or handle overloading (e.g., PAR),
  • Restrict MI handling to a curated subset of side-effect-free, parameterless methods strictly for guards (e.g., NOPOL).

The practical result is that unrestricted MI enumeration becomes computationally prohibitive—sometimes yielding hundreds or thousands of valid options per site—forcing prior techniques to heavily restrict MI-based repair and thus miss many real patches.

2. ELIXIR Framework and Repair-Expression Space

ELIXIR extends the classic four-step generate-and-validate paradigm via two principal advances: (a) a highly expressive repair-expression language that allows method calls on equal footing with variables, fields, and constants; and (b) a machine-learnt model to score and prioritize possible fixes for validation.

2.1 Framework Overview

Given a buggy program PP, a test suite TT (with at least one failing test), and an optional bug report RR, ELIXIR executes the following process:

  • Step A: Fault localization using SBFL (e.g., Ochiai) to identify suspicious statements.
  • Step B: Program transformation schemas (T1T8\mathrm{T}_1 \ldots \mathrm{T}_8) to generate candidate patches using the repair-expression language.
  • Step C: Machine-learnt scoring of candidate patches based on contextual and semantic features.
  • Step D: Validation via test-suite execution, returning the first plausible patch (i.e., one passing all tests).

Transformation Schemas

Schema Transformation Type Description
T1 Type widening int→long/float/double
T2 Change return expr Replace return with another compatible expr
T3/T4 Conditional guards Null or array/collection bounds guard
T5 Boolean operator mutations Relational/infix mutations (>,\geq,<,\leq,==,!=)
T6 Boolean predicate adjustments Add/remove conjuncts/disjuncts
T7 MI alteration Replace object, method, arguments, or full MI
T8 Insert new MI Synthesize and insert arbitrary well-typed MI

2.2 Repair-Expression Construction

Repair-expressions in ELIXIR follow the grammar:

  • literal::=booleannumbernullliteral ::= boolean \mid number \mid null
  • variable::=idvariable ::= id
  • field::=id.idfield ::= id.id
  • array::=id[expression]array ::= id[expression]
  • methodInvocation::=id(args)id.id(args)methodInvocation ::= id(args) \mid id.id(args)

At a target location, ELIXIR systematically enumerates all combinations of in-scope locals, class fields, accessible methods (including overloads), and builds all well-typed MI expressions up to a single composition depth. Formally, if VV denotes available variables/fields/literals and MM the set of method signatures (with average arity aa), the candidate expression set is RE=V{f(e)fM,eVn,type-match}RE = V \cup \{f(e) \mid f \in M, e \in V^n, \text{type-match}\}, and REv+mva|RE| \approx v + m \cdot v^a for v=Vv = |V|, m=Mm = |M|.

3. Machine-Learnt Patch Ranking

Due to the combinatorial explosion of candidate patches, ELIXIR employs a lightweight machine-learned ranking model to prioritize validation of the most promising candidates.

3.1 Classification Model

Each patch pp with repair-expression rr is scored as:

score(ploc,R)=σ(wϕ(p,loc,R))score(p \mid loc, R) = \sigma(w \cdot \phi(p, loc, R))

where σ(t)=1/(1+et)\sigma(t) = 1/(1+e^{-t}) is the logistic function, wR4w \in \mathbb{R}^4 are learned weights, and ϕ\phi is a four-dimensional feature vector.

3.2 Features

  • ϕ1\phi_1 (Distance Score): Proximity of rr’s elements to locloc within the source.
  • ϕ2\phi_2 (Contextual Similarity): Jaccard similarity of CamelCase-split tokens in rr versus code context.
  • ϕ3\phi_3 (Bug Report Similarity): Jaccard similarity of repair-expression tokens with those in RR (if available).
  • ϕ4\phi_4 (Context Frequency): Occurrence count of variables/fields from rr within ±3\pm3 lines of locloc.

3.3 Training Process

Training uses 1,158 one-line bug-fixes from Bugs.jar, balancing “positive” (developer-chosen) and “negative” repair-expressions (4× oversampling positives, ≈1,580 data points). Ridge-regularized logistic regression is implemented via WEKA, with 10-fold cross-validation. At inference, patches are sorted by predicted relevance, and the top N=50N = 50 are validated.

4. Experimental Evaluation

4.1 Datasets

  • Defects4J [Just et al. 2014]: Commons-Math, Commons-Lang, Joda-Time, JFreeChart. 82 single-hunk bugs selected.
  • Bugs.jar: Eight major Apache projects, filtered to 1,158 single-hunk bugs (each with buggy version, unit tests, developer patch, and report).

4.2 Baselines and Metrics

Benchmarked against ACS, HD-Repair, NOPOL, PAR’ (re-implementation), jGenProg, and two ELIXIR ablations: Elixir₁ (traditional patch space, no ML) and Elixir₂ (rich patch space, random top-N selection). Patches are measured as “correct” (semantically matching developer fix) or “incorrect plausible” (passes tests but not equivalent).

4.3 Results

Correct and Incorrect Repairs (Defects4J):

Subject ELIXIR ACS HD-Repair NOPOL PAR' jGenProg
Commons-Math 12/7 12/4 6/(*) 1/20 2/NR 5/13
Commons-Lang 8/4 3/1 7/(*) 3/4 1/NR 0/0
Joda-Time 2/1 1/0 1/(*) 0/1 0/NR 0/7
JFreeChart 4/3 2/0 2/(*) 1/5 0/NR 0/2
Total (82) 26/15 18/5 16/(10*) 5/30 3/NR 5/22

Ablation Impact

Variant Repair-Exprs Selection Correct Incorrect
Elixir₁ Traditional (ACS-like) None (no ML) 14 16
Elixir₂ Extended (ELIXIR) Random top-N 13 5
Elixir Extended Logistic reg 26 15

Schema Contribution

Schema Correct Incorrect
Change in MI (T7) 12 6
Boolean expr change 6 8
New MI insertion (T8) 3 0
Type widening 2 0
Return expr change 2 0
Null/size guard (T3/T4) 1 1

Results on Bugs.jar (Sampled 127 single-hunk bugs)

  • ELIXIR: 22 correct / 17 incorrect
  • Elixir₁: 14 correct / 16 incorrect

This reflects an 85% boost in correct repairs on Defects4J (from 14 to 26) and a 57% improvement on Bugs.jar (14 to 22) over the baseline.

5. Insights, Limitations, and Future Directions

The primary insight is that the expressive MI-focused repair-expression space enables ELIXIR to address entire bug classes missed by prior tools. This efficacy is contingent on the ranking system’s ability to surface correct patches among hundreds or thousands of candidates. The model’s four features—locality, code-context similarity, bug-report alignment, and usage frequency—jointly capture signals demonstrated to be effective in automated repair, code completion, and bug localization.

ELIXIR’s principal limitations include its restriction to single-hunk patches, reliance on a bug report for SbrS_{br}, and the simplicity of its feature set and logistic model. As the repair-expression language and ranking model are Java-specific (implemented via Spoon and ASM), generalization to other OO languages would necessitate additional grammar and AST transformation work.

Potential extensions include:

  • Integration with more sophisticated machine-learning models (e.g., random forests, neural models),
  • Expansion to multi-location/method repairs,
  • Cross-combination with oracle-based synthesis (e.g., Angelix/NOPOL),
  • Extension to other OO languages by adapting language-aware grammars.

The results suggest that a generate-and-validate approach, augmented with aggressive MI synthesis and lightweight relevance ranking, can significantly expand the class of OO bugs amenable to fully automated repair (Saha et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Elixir: Effective Object Oriented Program Repair.