Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Factor Importance Ranking and Selection using Total Indices (2401.00800v2)

Published 1 Jan 2024 in stat.ME and stat.ML

Abstract: Factor importance measures the impact of each feature on output prediction accuracy. Many existing works focus on the model-based importance, but an important feature in one learning algorithm may hold little significance in another model. Hence, a factor importance measure ought to characterize the feature's predictive potential without relying on a specific prediction algorithm. Such algorithm-agnostic importance is termed as intrinsic importance in Williamson et al. (2023), but their estimator again requires model fitting. To bypass the modeling step, we present the equivalence between predictiveness potential and total Sobol' indices from global sensitivity analysis, and introduce a novel consistent estimator that can be directly estimated from noisy data. Integrating with forward selection and backward elimination gives rise to FIRST, Factor Importance Ranking and Selection using Total (Sobol') indices. Extensive simulations are provided to demonstrate the effectiveness of FIRST on regression and binary classification problems, and a clear advantage over the state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Asuncion, A. and Newman, D. (2007), “UCI machine learning repository,” URL https://archive.ics.uci.edu/.
  2. Azadkia, M. and Chatterjee, S. (2021), “A simple measure of conditional dependence,” The Annals of Statistics, 49, 3070–3102.
  3. Bai, E.-w., Li, K., Zhao, W., and Xu, W. (2014), “Kernel based approaches to local nonlinear non-parametric variable selection,” Automatica, 50, 100–113.
  4. Bentley, J. L. (1975), “Multidimensional binary search trees used for associative searching,” Communications of the ACM, 18, 509–517.
  5. Borgonovo, E. (2007), “A new uncertainty importance measure,” Reliability Engineering & System Safety, 92, 771–784.
  6. Borgonovo, E., Castaings, W., and Tarantola, S. (2011), “Moment independent importance measures: New results and analytical test cases,” Risk Analysis: An International Journal, 31, 404–428.
  7. Borgonovo, E. and Plischke, E. (2016), “Sensitivity analysis: A review of recent advances,” European Journal of Operational Research, 248, 869–887.
  8. Breiman, L. (2001), “Random forests,” Machine learning, 45, 5–32.
  9. Broto, B., Bachoc, F., and Depecker, M. (2020), “Variance reduction for estimation of Shapley effects and adaptation to unknown input distribution,” SIAM/ASA Journal on Uncertainty Quantification, 8, 693–716.
  10. Chastaing, G., Gamboa, F., and Prieur, C. (2015), “Generalized Sobol sensitivity indices for dependent variables: numerical methods,” Journal of statistical computation and simulation, 85, 1306–1333.
  11. Chatterjee, S. (2021), “A new coefficient of correlation,” Journal of the American Statistical Association, 116, 2009–2022.
  12. Chen, W., Jin, R., and Sudjianto, A. (2005), “Analytical variance-based global sensitivity analysis in simulation-based design under uncertainty,” Journal of mechanical design, 127, 875–886.
  13. Cukierski, W. (2012), “Titanic - Machine Learning from Disaster,” Retrieved November 30, 2023 from https://kaggle.com/competitions/titanic.
  14. Darlington, R. B. (1968), “Multiple regression in psychological research and practice.” Psychological bulletin, 69, 161.
  15. Efron, B. and Stein, C. (1981), “The jackknife estimate of variance,” The Annals of Statistics, 586–596.
  16. Efroymson, M. A. (1960), “Multiple regression analysis,” Mathematical methods for digital computers, 191–203.
  17. Fisher, A., Rudin, C., and Dominici, F. (2019), “All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously.” J. Mach. Learn. Res., 20, 1–81.
  18. Friedman, J., Hastie, T., and Tibshirani, R. (2010), “Regularization Paths for Generalized Linear Models via Coordinate Descent,” Journal of Statistical Software, 33, 1–22, URL https://www.jstatsoft.org/v33/i01/.
  19. Friedman, J. H. (1991), “Multivariate adaptive regression splines,” The annals of statistics, 19, 1–67.
  20. — (2001), “Greedy function approximation: a gradient boosting machine,” Annals of statistics, 1189–1232.
  21. Genuer, R., Poggi, J.-M., and Tuleau-Malot, C. (2010), “Variable selection using random forests,” Pattern recognition letters, 31, 2225–2236.
  22. — (2015), “VSURF: an R package for variable selection using random forests,” The R Journal, 7, 19–33.
  23. Gevrey, M., Dimopoulos, I., and Lek, S. (2003), “Review and comparison of methods to study the contribution of variables in artificial neural network models,” Ecological modelling, 160, 249–264.
  24. Grömping, U. (2007), “Estimators of relative importance in linear regression based on variance decomposition,” The American Statistician, 61, 139–147.
  25. — (2009), “Variable importance assessment in regression: linear regression versus random forest,” The American Statistician, 63, 308–319.
  26. Hart, J. and Gremaud, P. A. (2018), “An approximation theoretic perspective of Sobol’indices with dependent variables,” International Journal for Uncertainty Quantification, 8.
  27. Hastie, T. J. (2017), “Generalized additive models,” in Statistical models in S, Routledge, 249–307.
  28. Homma, T. and Saltelli, A. (1996), “Importance measures in global sensitivity analysis of nonlinear models,” Reliability Engineering & System Safety, 52, 1–17.
  29. Huang, Z., Deb, N., and Sen, B. (2022), “Kernel partial correlation coefficient—a measure of conditional dependence,” The Journal of Machine Learning Research, 23, 9699–9756.
  30. Ishigami, T. and Homma, T. (1990), “An importance quantification technique in uncertainty analysis for computer models,” in [1990] Proceedings. First international symposium on uncertainty modeling and analysis, IEEE.
  31. Jansen, M. J. (1999), “Analysis of variance designs for model output,” Computer Physics Communications, 117, 35–43.
  32. Kucherenko, S., Tarantola, S., and Annoni, P. (2012), “Estimation of global sensitivity indices for models with dependent variables,” Computer physics communications, 183, 937–946.
  33. Kuhn, M., Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., Cooper, T., Mayer, Z., Kenkel, B., Team, R. C., et al. (2020), “Package ‘caret’,” The R Journal, 223.
  34. LeCun, Y., Bengio, Y., and Hinton, G. (2015), “Deep learning,” nature, 521, 436–444.
  35. Li, G., Rabitz, H., Yelvington, P. E., Oluwole, O. O., Bacon, F., Kolb, C. E., and Schoendorf, J. (2010), “Global sensitivity analysis for systems with independent and/or correlated inputs,” The journal of physical chemistry A, 114, 6022–6032.
  36. Li, R., Zhong, W., and Zhu, L. (2012), “Feature screening via distance correlation learning,” Journal of the American Statistical Association, 107, 1129–1139.
  37. Lin, Y. and Zhang, H. H. (2006), “Component selection and smoothing in multivariate nonparametric regression,” The Annals of Statistics, 34, 2272–2297.
  38. Lundberg, S. M. and Lee, S.-I. (2017), “A unified approach to interpreting model predictions,” Advances in neural information processing systems, 30.
  39. Mase, M., Owen, A. B., and Seiler, B. B. (2022), “Variable importance without impossible data,” arXiv preprint arXiv:2205.15750.
  40. Morris, M. D. (1991), “Factorial sampling plans for preliminary computational experiments,” Technometrics, 33, 161–174.
  41. Nash, W. J., Sellers, T. L., Talbot, S. R., Cawthorn, A. J., and Ford, W. B. (1994), “The population biology of abalone (haliotis species) in tasmania. i. blacklip abalone (h. rubra) from the north coast and islands of bass strait,” Sea Fisheries Division, Technical Report, 48, p411.
  42. Oakley, J. E. and O’Hagan, A. (2004), “Probabilistic sensitivity analysis of complex models: a Bayesian approach,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 751–769.
  43. Owen, A. B. (2014), “Sobol’indices and Shapley value,” SIAM/ASA Journal on Uncertainty Quantification, 2, 245–251.
  44. Plischke, E., Borgonovo, E., and Smith, C. L. (2013), “Global sensitivity measures from given data,” European Journal of Operational Research, 226, 536–550.
  45. Razavi, S., Jakeman, A., Saltelli, A., Prieur, C., Iooss, B., Borgonovo, E., Plischke, E., Piano, S. L., Iwanaga, T., Becker, W., et al. (2021), “The future of sensitivity analysis: An essential discipline for systems modeling and policy support,” Environmental Modelling & Software, 137, 104954.
  46. Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., and Tarantola, S. (2010), “Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index,” Computer physics communications, 181, 259–270.
  47. Sobol’, I. M. (1993), “On sensitivity estimation for nonlinear mathematical models,” Mathematical Modeling and Computational Experiments, 1, 407–414.
  48. — (2001), “Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates,” Mathematics and computers in simulation, 55, 271–280.
  49. Sobol’, I. M. and Kucherenko, S. (2009), “Derivative based global sensitivity measures and their link with global sensitivity indices,” Mathematics and Computers in Simulation, 79, 3009–3017.
  50. Song, E., Nelson, B. L., and Staum, J. (2016), “Shapley effects for global sensitivity analysis: Theory and computation,” SIAM/ASA Journal on Uncertainty Quantification, 4, 1060–1083.
  51. Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007), “Measuring and testing dependence by correlation of distances,” The Annals of Statistics, 35, 2769–2794.
  52. Thodberg, H. H. (1993), “Ace of bayes: Application of neural networks with pruning,” Technical report.
  53. Tibshirani, R. (1996), “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267–288.
  54. Wei, P., Lu, Z., and Song, J. (2015), “Variable importance analysis: A comprehensive review,” Reliability Engineering & System Safety, 142, 399–432.
  55. Williamson, B. D., Gilbert, P. B., Carone, M., and Simon, N. (2021), “Nonparametric variable importance assessment using machine learning techniques,” Biometrics, 77, 9–22.
  56. Williamson, B. D., Gilbert, P. B., Simon, N. R., and Carone, M. (2023), “A general framework for inference on algorithm-agnostic variable importance,” Journal of the American Statistical Association, 118, 1645–1658.
  57. Wojtas, M. and Chen, K. (2020), “Feature importance ranking for deep learning,” Advances in Neural Information Processing Systems, 33, 5105–5114.
  58. Wolberg, W. H. and Mangasarian, O. L. (1990), “Multisurface method of pattern separation for medical diagnosis applied to breast cytology.” Proceedings of the national academy of sciences, 87, 9193–9196.
  59. Wright, M. N. and Ziegler, A. (2017), “ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R,” Journal of Statistical Software, 77, 1–17.
  60. Yang, L., Lv, S., and Wang, J. (2016), “Model-free variable selection in reproducing kernel Hilbert space,” The Journal of Machine Learning Research, 17, 2885–2908.
  61. Ye, G.-B. and Xie, X. (2012), “Learning sparse gradients for variable selection and dimension reduction,” Machine learning, 87, 303–355.
  62. Yeh, I.-C. (1998), “Modeling of strength of high-performance concrete using artificial neural networks,” Cement and Concrete research, 28, 1797–1808.
Citations (1)

Summary

We haven't generated a summary for this paper yet.