Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

131 tokens/sec

GPT-4o

10 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

LTFix

Updated 1 July 2025

LTFix is an automated program repair approach that combines large language models with formal typestate analysis to effectively target complex memory-related errors in C programs.
It specifically addresses interprocedural bugs and LLM context limitations by using typestate tracing to extract only semantically relevant code context for repair.
Evaluations show LTFix significantly outperforms state-of-the-art APR tools and LLM agents in fixing real-world C memory errors with greater efficiency and accuracy.

LTFix is an automated program repair (APR) approach that targets memory-related errors in C programs at the repository level, particularly focusing on bugs that span multiple functions and files and challenge both traditional and contemporary APR methodologies. Its distinguishing feature is the integration of LLMs with formal typestate analysis, enabling semantic context extraction for effective and scalable repair of complex memory errors.

1. Objectives and Conceptual Innovations

LTFix is designed to resolve critical memory errors—such as use-after-free, double-free, and memory leaks—that arise from intricate program logic and C's manual memory management practices. Its core innovation is to combine the general reasoning abilities of LLMs with a typestate-guided context retrieval system. This hybrid model addresses two critical challenges in memory error repair: the difficulty of comprehending interprocedural memory management patterns and the context window limitations imposed by LLMs.

Unlike template-based APR or deep learning models that depend on extensive, bug-specific data or rigid pattern libraries, LTFix leverages program semantics to guide LLM reasoning. This synergy results in enhanced interpretability, precision, and coverage for challenging real-world C bugs. In a large-scale evaluation across 11 open-source projects encompassing over a million lines of code, LTFix repaired 31 out of 40 developer-confirmed memory errors, outperforming state-of-the-art static and LLM-based APR tools.

2. Memory Error Repair Challenges and Motivation

The principal technical hurdles addressed by LTFix are:

Interprocedural Complexity: Many memory bugs are not localized; their causes and symptoms are distributed over several functions or files. For example, an allocation in one function may be incorrectly freed or reused in another, with aliasing compounding the risk of errors such as double-frees or leaks.
LLM Context Window Limitations: Effective repair in such scenarios often requires repository- or program-wide context. However, existing LLMs are constrained by input token limits, and naive extraction of file- or module-level code frequently results in overly large or unfocused prompts.

LTFix addresses these by constructing a semantically minimal, typestate-driven trace of the error's propagation path and supplying only these critical program contexts to the LLM, thus achieving both relevance and efficiency in repair.

3. Typestate-Guided Context Retrieval

Central to LTFix is the use of a finite typestate automaton (FTA) to formally represent and trace the evolution of memory objects through program states:

$\mathcal{A}_{\text{ET}} = \langle \Sigma, \mathbb{T}, T_u, \delta, T_{ET} \rangle$

where:

$\Sigma$ is the set of operations (e.g., alloc, free, use, realloc, set_null),
$\mathbb{T}$ is the set of typestates (e.g., uninitialized, live, dead, and respective error states such as $T_{\text{UAF}}$ , $T_{\text{DF}}$ , $T_{\text{ML}}$ ),
$T_u$ is the initial typestate,
$\delta$ defines typestate transitions driven by program operations,
$T_{ET}$ is the error typestate for a specific error type.

During program execution (e.g., as replayed under GDB), LTFix tracks the state evolution of the implicated memory address, extracting only those code locations, transitions, and call stacks where relevant typestate changes occur (identified by the automaton). This process outputs a concise "context trace"—an ordered sequence of program states and transitions directly responsible for the error.

For example, in a use-after-free (UAF) bug, the automaton will register transitions such as T_u → T_l (alloc), T_l → T_d (free), and identify the erroneous T_d → T_{UAF} (use after free). Only the code responsible for these transitions is included in the LLM prompt.

This approach ensures that the LLM receives only essential semantic information required for precise reasoning, mitigating the token limitation problem and sharpening the focus of the repair.

4. System Workflow and Implementation

The operational workflow of LTFix consists of several key stages:

Dynamic Error Tracing: Utilizing debugger-based execution (e.g., GDB), the system traces the program's runtime behavior, logging all typestate-relevant operations and context transitions affecting the error-prone memory address.
Typestate-Driven Trace Extraction: Guided by the FTA, only statements and transition points critical to the error's propagation are extracted for inclusion in the context trace.
Compact Contextual Prompting: The extracted, semantically-dense trace is formatted into a concise prompt for the LLM, often orders of magnitude smaller than a naive full-file or full-module context.
LLM-Based Repair Generation: The LLM receives the prompt and generates candidate patches aimed at correcting the identified memory error (e.g., adding a missing free, fixing a double-free, preventing a use-after-free).
Validation and Prioritization: Generated patches are tested to ensure that they resolve the memory error (verified by test suites or dynamic analysis) and do not introduce new faults.

Ablation studies indicate that omitting typestate-guided context tracing significantly reduces fix rates and increases the likelihood of harmful patches, whereas LTFix's approach both maximizes accuracy and minimizes unnecessary LLM computation.

5. Evaluation and Performance

Empirical evaluation used a benchmark set comprising 40 real-world, developer-confirmed memory errors from industrial and research-grade open-source C projects. The main findings were:

Correct Repairs: LTFix repaired 31 out of 40 errors (77.5%), outperforming SAVER and ProveNFix by 14.5 and 3.43 errors, respectively, and nearly doubling the fixes achieved by the LLM agent SWE-agent.
Efficiency: LTFix processed nearly twice as many errors while using approximately 41 times fewer LLM tokens compared to the baseline LLM agent (saving approximately 17 million tokens).
Repair Complexity: LTFix uniquely succeeded on interprocedural errors and memory issues involving pointer aliasing, cyclic realloc patterns, and multi-location leaks that resisted resolution by static or purely neural methods.
Patch Quality: Patches produced by LTFix were both accurate (validated against developer fixes) and less likely to introduce new errors than those from file- or function-level prompting.

The table summarizes these findings:

Aspect	Summary
Objective/Innovation	Hybrid LLM & typestate-driven memory error repair at repository scale
APR Distinction	No reliance on templates/big memory-bug data; interprocedural patterns; semantic focus
Challenges Addressed	Interprocedural complexity & LLM token limits
Typestate Context Retrieval	Only FTA-relevant, minimal, semantically critical context provided to LLM
Evaluation	Bested all tested baselines/agents in accuracy, coverage, and efficiency
Implication/Future	Foundational for explainable and scalable hybrid repair tools

6. Implications and Future Directions

LTFix establishes the efficacy of hybrid, analysis-augmented program repair for low-level, semantically complex software faults. Key implications and potential extensions include:

Paradigm Shift: Demonstrates that integrating formal semantics (typestate automata) and LLM-based reasoning can achieve repository-scale, explainable, and precise automated repair well beyond localized or template-based approaches.
Token Efficiency and Scaling: Semantic curation enables LLM use on large codebases, greatly increasing both fix rates and efficiency.
Explainability and Trust: The trace-based rationale provided for both error origin and fix improves auditability—an important consideration for safety-critical systems.
Generalization Potential: While focused on memory errors in C, the approach suggests a path toward handling other semantic bugs (e.g., resource leaks, concurrency violations) and may be adapted for languages with manual resource management.

Directions for future research include extension to concurrency-related errors, integration with static specification mining, adaptation to multi-threaded/async contexts, and broader language support (e.g., C++, Rust, Java).

7. Summary

LTFix advances the state of automated memory error repair in C by leveraging typestate-guided tracing to distill and deliver only the semantically central context to LLMs. This strategy overcomes major limitations in previous APR systems and current LLM-based agents, yielding high-precision, efficient, and explainable repository-level repairs for complex, distributed memory bugs. The typestate-guided context retrieval mechanism is central: it bridges the semantic gap between traditional program analysis and neural reasoning, thus providing a foundation for next-generation, hybrid, automated software maintenance tools.

PDF Markdown Chat (Upgrade)