Trust Regions for Explanations via Black-Box Probabilistic Certification (2402.11168v3)
Abstract: Given the black box nature of machine learning models, a plethora of explainability methods have been developed to decipher the factors behind individual decisions. In this paper, we introduce a novel problem of black box (probabilistic) explanation certification. We ask the question: Given a black box model with only query access, an explanation for an example and a quality metric (viz. fidelity, stability), can we find the largest hypercube (i.e., $\ell_{\infty}$ ball) centered at the example such that when the explanation is applied to all examples within the hypercube, (with high probability) a quality criterion is met (viz. fidelity greater than some value)? Being able to efficiently find such a \emph{trust region} has multiple benefits: i) insight into model behavior in a \emph{region}, with a \emph{guarantee}; ii) ascertained \emph{stability} of the explanation; iii) \emph{explanation reuse}, which can save time, energy and money by not having to find explanations for every example; and iv) a possible \emph{meta-metric} to compare explanation methods. Our contributions include formalizing this problem, proposing solutions, providing theoretical guarantees for these solutions that are computable, and experimentally showing their efficacy on synthetic and real data.
- To trust or not to trust an explanation: using LEAF to evaluate local linear XAI methods. PeerJ Computer Science, 7:e479, apr 2021. doi: 10.7717/peerj-cs.479. URL https://doi.org/10.7717%2Fpeerj-cs.479.
- On computing probabilistic explanations for decision trees. Advances in Neural Information Processing Systems, 2022.
- The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48–77, jan 2003.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015.
- Consistent counterfactuals for deep models. arXiv:2110.03109, 2021.
- X-armed bandits. J. Mach. Learn. Res., 12:1655–1695, 2011.
- Robustness verification of tree-based models. Advances in Neural Information Processing Systems, 32:12317–12328, 2019.
- ZOO: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec ’17, page 15–26, 2017. URL https://doi.org/10.1145/3128572.3140448.
- Certified adversarial robustness via randomized smoothing. In Intl. Conference on Machine Learning, 2019.
- Boolean decision rules via column generation. In Advances in Neural Information Processing Systems, pages 4655–4665, 2018.
- Miguel De Carvalho. Confidence intervals for the minimum of a function using extreme value statistics. International Journal of Mathematical Modelling and Numerical Optimisation, 2(3):288–296, 2011. doi: 10.1504/IJMMNO.2011.040793. URL \url{https://www.inderscienceonline.com/doi/abs/10.1504/IJMMNO.2011.040793}.
- Laurens de Haan. Estimation of the minimum of a function using order statistics. Journal of the American Statistical Association, 76(374):467–469, 1981. doi: 10.1080/01621459.1981.10477669. URL \url{https://www.tandfonline.com/doi/abs/10.1080/01621459.1981.10477669}.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Explanations based on the missing: Towards contrastive explanations with pertinent negatives. In Advances in Neural Information Processing Systems, pages 592–603, 2018.
- Model agnostic contrastive explanations for structured data. https://arxiv.org/abs/1906.00117, 2019.
- Is this the right neighborhood? accurate and query efficient model agnostic explanations. In Advances in Neural Information Processing Systems, 2022.
- Locally invariant explanations: Towards stable and unidirectional explanations through local invariant learning. Adv. in Neural Inf. Proc. Systems, 2023.
- A dual approach to scalable verification of deep networks. In UAI, volume 1, page 3, 2018.
- FICO. Explainable machine learning challenge. https://community.fico.com/s/explainable-machine-learning-challenge?tabset-3158a=2, 2018a. Accessed: 2018-10-25.
- FICO. FICO Explainable Machine Learning Challenge. https://community.fico.com/s/explainable-machine-learning-challenge, 2018b.
- J. Gama. Functional trees. Machine Learning, 55(3):219–250, 2004.
- AI2: safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy, SP 2018, Proceedings, 21-23 May 2018, San Francisco, California, USA, pages 3–18, 2018. doi: 10.1109/SP.2018.00058. URL https://doi.org/10.1109/SP.2018.00058.
- Deep learning, volume 1. MIT Press, 2016.
- David Gunning. Explainable artificial intelligence (xai). In Defense Advanced Research Projects Agency, 2017. URL https://www.darpa.mil/program/explainable-artificial-intelligence.
- Efficient data representation by selecting prototypes with importance weights. In Proceedings of the IEEE International Conference on Data Mining, 2019.
- Robust counterfactual explanations for neural networks with probabilistic guarantees. Intl. Conf. Learning Representations, 2023.
- Complexity of linear regions in deep networks. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2596–2604. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/hanin19a.html.
- Alexey Ignatiev. Towards trustable explainable ai. IJCAI, pages 5154–5158, 2020.
- Reluplex: An efficient SMT solver for verifying deep neural networks. In International Conference on Computer Aided Verification, pages 97–117. Springer, 2017.
- Examples are not enough, learn to criticize! Criticism for interpretability. In In Advances of Neural Inf. Proc. Systems, 2016.
- Alex Krizhevsky. Learning multiple layers of features from tiny images, 2009.
- The lrp toolbox for artificial neural networks. Journal of Machine Learning Research, 17(114):1–5, 2016. URL http://jmlr.org/papers/v17/15-618.html.
- Sok: Certified robustness for deep neural networks. IEEE Symposium on Security and Privacy, 2023.
- Connecting algorithmic research and usage contexts: A perspective of contextualized evaluation for explainable ai. AAAI HCOMP, 2022.
- Zoopt: Toolbox for derivative-free optimization. In SCIENCE CHINA Information Sciences, 2022.
- A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pages 4765–4774, 2017.
- Finding regions of counterfactual explanations via robust optimization. arXiv:2301.11113, 2023.
- Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 2017.
- A survey on efficient methods for adversarial robustness. IEEE Access, 10:118815–118830, 2022.
- A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2:1–32, 1994.
- Probabilistically robust recourse: Navigating the trade-offs between costs and robustness in algorithmic recourse. Intl. Conf. Learning Representations, 2023.
- Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
- Semidefinite relaxations for certifying robustness to adversarial examples. In Advances in Neural Information Processing Systems, volume 31, 2018.
- Model agnostic multilevel explanations. In Advances in Neural Information Processing Systems, 2020.
- “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, 2016.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. arXiv:1610.02391v3, 2016.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv:1312.6034, 2013.
- A. Slivkins. Introduction to Multi-Armed Bandits. Foundations and Trends in Machine Learning Series. Now Publishers, 2019.
- Richard L. Smith. Statistics of Extremes, with Applications in Environment, Insurance, and Finance. Chapman and Hall/CRC, 2003.
- Axiomatic attribution for deep networks. ICML, 2017.
- Evaluating robustness of neural networks with mixed integer programming. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. URL https://openreview.net/forum?id=HyGIdiRqtm.
- Hong Zhu Tong Yu. Hyper-parameter optimization: A review of algorithms and applications. arXiv:2003.05689, 2020.
- AutoZOOM: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):742–749, Jul. 2019. URL https://ojs.aaai.org/index.php/AAAI/article/view/3852.
- Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 10–19, New York, NY, USA, 2019. Association for Computing Machinery.
- Openml: networked science in machine learning. SIGKDD Explorations, 15(2):49–60, 2013. doi: 10.1145/2641190.2641198. URL http://doi.acm.org/10.1145/2641190.264119.
- Counterfactual explanations without opening the black box: Automated decisions and the GDPR. arXiv:1711.00399, 2017.
- Falling rule lists. In In AISTATS, 2015.
- Towards fast computation of certified robustness for ReLU networks. International Conference on Machine Learning, 2018a.
- Evaluating the robustness of neural networks: An extreme value theory approach. In Intl. Conference on Learning Representations, 2018b.
- On the design of black-box adversarial examples by leveraging gradient-free optimization and operator splitting method. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.