A Simple and Effective Model-Based Variable Importance Measure (1805.04755v1)

Published 12 May 2018 in stat.ML and cs.LG

Abstract: In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms---like random forests and gradient boosted decision trees---have a natural way of quantifying the importance or relative influence of each feature. Other algorithms---like naive Bayes classifiers and support vector machines---are not capable of doing so and model-free approaches are generally used to measure each predictor's importance. In this paper, we propose a standardized, model-based approach to measuring predictor importance across the growing spectrum of supervised learning algorithms. Our proposed method is illustrated through both simulated and real data examples. The R code to reproduce all of the figures in this paper is available in the supplementary materials.

Citations (207)

View on Semantic Scholar

Summary

The paper’s main contribution is proposing a unified method to quantify variable importance by measuring the variability of partial dependence functions.
The method applies to any predictive model, offering consistent importance estimates across diverse algorithms such as neural networks and support vector machines.
Experimental evaluations on simulated and real datasets demonstrate its reliability for variable interpretation and potential for detecting interactions.

A Simple and Effective Model-Based Variable Importance Measure

The paper "A Simple and Effective Model-Based Variable Importance Measure" by Greenwell, Boehmke, and McCarthy presents a standardized approach to calculating variable importance using partial dependence plots (PDPs) across various supervised learning models. The motivation stems from the challenge of interpreting complex models, such as neural networks and support vector machines, and gaining insights into the relative influence of predictor variables in big data scenarios.

Key Contributions

The authors propose a standardized method where variable importance is quantified by the variability or "flatness" of the partial dependence functions. The approach is applicable to any model from which predictions can be obtained, thus unifying the measurement of variable importance across different algorithms. This is particularly useful, as traditional measures of variable importance are model-specific, such as the Gini importance in random forests or the coefficients in linear models.

Methodology

The method involves computing the partial dependence of prediction with respect to each predictor variable by marginalizing over other variables. The "importance" is determined by the standard deviation of these partial dependence predictions. For continuous variables, the standard deviation is used, while for categorical variables, a range-based estimate is employed.

This approach has several advantages:

Consistency across model types, making it particularly useful for ensemble methods like super learners, where base models may compute importance differently.
It facilitates interaction detection by calculating the standard deviation of importance measures across predictions conditioned on another variable.

Experimental Evaluation

The paper provides a comprehensive evaluation with both simulated and real-world datasets:

The Ames housing dataset and the Friedman regression problem are employed to demonstrate the effectiveness of the proposed method.
Comparisons are made with methods like the Garson and Olden algorithms for neural networks, and the authors show that their method identifies important variables more reliably.

Implications and Future Directions

The implications of this paper are substantial for both practical applications and theoretical developments. For practitioners, this method offers a powerful tool for interpretability, especially when utilizing complex models that are otherwise challenging to interpret. Theoretically, it opens avenues for further refinement and robustness improvements. For instance, the authors acknowledge potential sensitivity to outliers and suggest implementing more robust statistics, such as median absolute deviation.

Furthermore, this method's relationship with interaction detection methods like Friedman's $H$ -statistic highlights its utility in complex interaction effects exploration, a prominent aspect in modern machine learning applications.

Conclusion

In summary, Greenwell et al. provide a flexible and comprehensive approach to understanding the influence of predictor variables within machine learning models. Their standardized variable importance measure, leveraging PDPs, addresses limitations of existing methods and caters to the growing demand for interpretability in machine learning. This paper's methodology not only enhances our capability to interpret complex models but also encourages future research in improving model transparency and trustworthiness.

PDF Markdown