RecovSlicing: Dynamic Data Dependency Recovery
- RecovSlicing is an advanced computational methodology that recovers dynamic data dependencies in software debugging using partial instrumentation and LLM-driven inference.
- It synthesizes missing execution traces and accurately maps variable definitions, including implicit data structures, through adaptive context generation.
- Benchmarks show RecovSlicing achieves up to 98.3% accuracy and significantly improves bug localization, outperforming traditional slicing techniques.
RecovSlicing is an advanced computational methodology for recovering dynamic data dependencies in software debugging tasks by leveraging partial instrumentation and LLMs to infer missing execution information. Primarily designed for object-oriented programs where exhaustive instrumentation or replicated runs are costly or impossible, RecovSlicing reconstructs omitted steps in the execution trace and accurately identifies the dynamic definition of a variable at a target program step. The technique supports both explicit variables and implicit data structures, such as those accessed through library APIs, by recovering runtime values and aligning them to observed memory states. Its integration in automated debugging workflows has yielded substantial improvements in regression bug localization and general slice analysis.
1. Computational Motivation and Problem Definition
The central challenge addressed by RecovSlicing lies in identifying the dynamic definition of a variable at program statement given only a partially recorded execution trace. Traditional dynamic slicing techniques either require (i) exhaustive instrumentation that logs all possible read/write events for during program execution or (ii) costly replicated runs for re-examining which assignment produced the runtime value. These strategies falter in practical settings with frequent library outsourcing or non-deterministic program behavior. RecovSlicing circumvents these limitations by using LLMs to simulate the missing portions of execution, informed by the recorded trace and static code. The method answers the core debugging question: "Why does this variable have this value at this step?"
2. Technical Workflow and Recovery Algorithm
RecovSlicing operates via a two-component process: variable recovery and definition inference.
Variable Recovery:
An adaptive context generation strategy produces a "synthesized" in-context version of the code suitable for full instrumentation. This synthetic program, combined with the partial trace and code base, is input to an LLM which predicts both the value and structural representation (object graph) of the queried variable at point . The method formally defines a variable's access path (see LaTeX formulation in the original work) that maps variable identities across execution and object fields.
Definition Inference:
Given the reconstructed value, RecovSlicing traverses the partial execution trace backward. Using a memory aliasing algorithm (Algorithm 1), the technique builds a mapping between variable instances and their memory addresses—aligned between observed and synthesized traces. The process robustly locates the most recent assignment of , even if the definition arises in third-party code or through indirect aliasing. Prompt design for the LLM is critical, with structured examples tailored to minimize hallucinations and encourage correct trace estimation.
3. Quantitative Evaluation and Benchmarking
RecovSlicing has been rigorously benchmarked against industry-standard slicing approaches including Slicer4J, ND-Slicer, LLM-Slicer, and re-execution Slicer across three slicing datasets encompassing 8,300 data dependency queries. For the LLM Generated Dataset, RecovSlicing achieved 80.3% accuracy and recall, outperforming the best baseline (LLM-Slicer) at 39.0% precision and 53.4% recall. On other datasets, RecovSlicing reported up to 98.3% accuracy and recall (vs. best baseline at 59.9% and 87.1%, respectively). These results underscore its superior capacity to recover dependencies in traces with sparseness, incomplete instrumentation, or non-deterministic control flow.
Benchmark | RecovSlicing Accuracy | Best Baseline Accuracy | RecovSlicing Recall | Best Baseline Recall |
---|---|---|---|---|
LLM Generated | 80.3% | 39.0% | 91.1% | 53.4% |
ND-Slicer Dataset | 91.1% | 82.0% | 91.1% | 79.1% |
LLM-Slicer Dataset | 98.3% | 59.9% | 98.3% | 87.1% |
4. Instrumentation, Alias Management, and Execution Recovery
A central innovation of RecovSlicing is its capability to operate with minimal instrumentation. Only application code is instrumented, while dependencies within external libraries are dynamically recovered via LLM estimation. This is particularly advantageous in environments with heavy library interaction, where full instrumentation would be prohibitive. RecovSlicing features sophisticated memory alignment and alias inference (Algorithm 2), ensuring that variable references—direct or indirect—are accurately mapped and traced over execution steps, even through containers or object pools.
In cases of non-deterministic behavior (e.g. concurrent execution, runtime randomness), where re-execution yields divergent traces, RecovSlicing reconstructs plausible execution states for missing slices using statistical inference guided by the recorded history and code semantics.
5. Application in Automated Debugging and Bug Localization
RecovSlicing has been integrated into dual-slicing–based regression bug localization workflows. In the Defects4J benchmark, the bug localizer augmented with RecovSlicing achieved a debugging success rate of 89%, compared to 73% with the baseline Tregression method. This improvement is partly attributable to RecovSlicing’s ability to recover dependencies that would otherwise be missed in traces with partial or missing instrumentation.
The technique is particularly suited for object-oriented and dynamically allocated languages, supporting scenarios where a variable’s definition is implicit or spans boundaries between user and third-party code.
6. Limitations and Future Directions
Despite substantial accuracy and coverage gains, several challenges remain:
- Hallucination: Careful prompt design and context selection are required to prevent the LLM from proposing unjustified or spurious execution paths.
- Language Coverage: The technique is currently optimized for major object-oriented languages and may require adaptation for scripting, functional, or low-level languages.
- Trace Recovery Completeness: As with all partial recovery methods, RecovSlicing’s precision depends on the quality and representativity of the observed instrumentation. Further improvements could arise from enriched in-context learning modules and advanced prompt engineering.
- Tool Integration: Future research objectives include deeper integration with debugging IDEs and expansion to live time-traveling debugging scenarios.
7. Significance and Advancements in Slicing Methodologies
RecovSlicing marks a paradigm shift in dynamic dependency recovery for program slicing. Its principled fusion of static analysis, lightweight instrumentation, and state-of-the-art LLM recovery addresses longstanding challenges of cost, coverage, and determinism in debugging workflows. By computing dependencies with high fidelity and recall in a single run, it unlocks practical time-traveling debugging capabilities and supports fine-grained regression analysis. The method’s adaptability to implicit variables and recovery across interprocedural boundaries underscores its relevance in contemporary large-scale programming environments.