Improving Fault Localization by Integrating Value and Predicate Based Causal Inference Techniques

Published 11 Feb 2021 in cs.SE | (2102.06292v1)

Abstract: Statistical fault localization (SFL) techniques use execution profiles and success/failure information from software executions, in conjunction with statistical inference, to automatically score program elements based on how likely they are to be faulty. SFL techniques typically employ one type of profile data: either coverage data, predicate outcomes, or variable values. Most SFL techniques actually measure correlation, not causation, between profile values and success/failure, and so they are subject to confounding bias that distorts the scores they produce. This paper presents a new SFL technique, named \emph{UniVal}, that uses causal inference techniques and machine learning to integrate information about both predicate outcomes and variable values to more accurately estimate the true failure-causing effect of program statements. \emph{UniVal} was empirically compared to several coverage-based, predicate-based, and value-based SFL techniques on 800 program versions with real faults.