Papers
Topics
Authors
Recent
Search
2000 character limit reached

On the explainable properties of 1-Lipschitz Neural Networks: An Optimal Transport Perspective

Published 14 Jun 2022 in cs.AI, cs.CR, cs.CV, cs.LG, and stat.ML | (2206.06854v3)

Abstract: Input gradients have a pivotal role in a variety of applications, including adversarial attack algorithms for evaluating model robustness, explainable AI techniques for generating Saliency Maps, and counterfactual explanations.However, Saliency Maps generated by traditional neural networks are often noisy and provide limited insights. In this paper, we demonstrate that, on the contrary, the Saliency Maps of 1-Lipschitz neural networks, learned with the dual loss of an optimal transportation problem, exhibit desirable XAI properties:They are highly concentrated on the essential parts of the image with low noise, significantly outperforming state-of-the-art explanation approaches across various models and metrics. We also prove that these maps align unprecedentedly well with human explanations on ImageNet.To explain the particularly beneficial properties of the Saliency Map for such models, we prove this gradient encodes both the direction of the transportation plan and the direction towards the nearest adversarial attack. Following the gradient down to the decision boundary is no longer considered an adversarial attack, but rather a counterfactual explanation that explicitly transports the input from one class to another. Thus, Learning with such a loss jointly optimizes the classification objective and the alignment of the gradient, i.e. the Saliency Map, to the transportation plan direction.These networks were previously known to be certifiably robust by design, and we demonstrate that they scale well for large problems and models, and are tailored for explainability using a fast and straightforward method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Existence, stability and scalability of orthogonal convolutional neural networks, 2021.
  2. Sanity checks for saliency maps. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 9525–9536, Red Hook, NY, USA, 2018. Curran Associates Inc.
  3. L. Ambrosio and A. Pratelli. Existence and stability results in the L1 theory of optimal transportation, pages 123–160. Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.
  4. A unified view of gradient-based attribution methods for deep neural networks. CoRR, 2017.
  5. Sorting out Lipschitz function approximation. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 291–301, Long Beach, California, USA, June 2019. PMLR.
  6. A unified algebraic perspective on lipschitz neural networks. In The Eleventh International Conference on Learning Representations, 2023.
  7. Wasserstein generative adversarial networks. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 214–223, International Convention Centre, Sydney, Australia, Aug. 2017. PMLR.
  8. The many faces of 1-lipschitz neural networks. CoRR, abs/2104.05097, 2021.
  9. Evaluating and aggregating feature-based model explanations. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 2020.
  10. Å. Björck and C. Bowie. An Iterative Algorithm for Computing the Best Estimate of an Orthogonal Matrix. SIAM Journal on Numerical Analysis, 8(2):358–364, June 1971.
  11. Fliptest: Fairness auditing via optimal transport. CoRR, abs/1906.09218, 2019.
  12. Proper network interpretability helps adversarial robustness in classification. In International Conference on Machine Learning, 2020.
  13. N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. IEEE, 2017.
  14. Concise explanations of neural networks using adversarial training. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020.
  15. Transport-based counterfactual models. arxiv:2108.13025, 2021.
  16. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
  17. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  18. Towards robust explanations for deep neural networks. volume 121, page 108194, 2022.
  19. On the connection between adversarial robustness and saliency map interpretability. In International Conference on Machine Learning, 2019.
  20. Look at the variance! efficient black-box explanations with sobol-based sensitivity analysis. CoRR, abs/2111.04138, 2021.
  21. Don’t lie to me! robust and efficient explainability with verified perturbation analysis. arXiv preprint arXiv:2202.07728, 2022.
  22. Harmonizing the object recognition strategies of deep neural networks with humans. Advances in Neural Information Processing Systems (NeurIPS), 2022.
  23. How Good is your Explanation? Algorithmic Stability Measures to Assess the Quality of Explanations for Deep Neural Networks. In 2022 CVF Winter Conference on Applications of Computer Vision (WACV), Hawaii, United States, Jan. 2022.
  24. Xplique: an neural networks explainability toolbox. 2021.
  25. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
  26. Interpretation of neural networks is fragile. In Association for the Advancement of Artificial Intelligence, 2017.
  27. Explaining and harnessing adversarial examples. arXiv:1412.6572 [stat.ML], Dec. 2014.
  28. Counterfactual visual explanations. In Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.
  29. Improved training of wasserstein gans. CoRR, abs/1704.00028, 2017.
  30. The out-of-distribution problem in explainability and search methods for feature importance explanations. NeurIPS, 2021.
  31. On baselines for local feature attributions. arXiv preprint arXiv:2101.00905, 2021.
  32. M. Hein and M. Andriushchenko. Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation. arXiv:1705.08475 [cs, stat], May 2017.
  33. Fooling neural network interpretations via adversarial model manipulation. In Advances in Neural Information Processing Systems, volume 32, 2019.
  34. Evaluations and methods for explanation through robustness analysis. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
  35. Evaluations and methods for explanation through robustness analysis. In International Conference on Learning Representations, 2021.
  36. STEEX: steering counterfactual explanations with semantics. CoRR, abs/2111.09094, 2021.
  37. Algorithmic recourse: from counterfactual explanations to interventions. In ACM Conference on Fairness, Accountability, and Transparency, pages 353–362, 2021.
  38. The (un) reliability of saliency methods. 2019.
  39. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  40. D. Lewis. Causation. Journal of Philosophy, 70(17):556–567, 1973.
  41. Preventing gradient attenuation in lipschitz constrained convolutional networks. arXiv:1911.00937, Apr. 2019.
  42. Do explanations reflect decisions? a machine-centric strategy to quantify the performance of explainability algorithms. In NIPS, 2019.
  43. What are the visual features underlying human versus machine vision? In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 2706–2714, 2017.
  44. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), Dec. 2015.
  45. Towards deep learning models resistant to adversarial attacks. ArXiv, 2017.
  46. Spectral normalization for generative adversarial networks. ArXiv, abs/1802.05957, 2018.
  47. Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks. Visualization for Deep Learning workshop, International Conference in Machine Learning, 2016. arXiv preprint arXiv:1602.03616.
  48. Feature visualization. Distill, 2(11):e7, 2017.
  49. Lightweight Lipschitz Margin Training for Certified Defense against Adversarial Examples. arXiv:1811.08080 [cs, stat], Nov. 2018.
  50. RISE: randomized input sampling for explanation of black-box models. CoRR, abs/1806.07421, 2018.
  51. G. Peyré and M. Cuturi. Computational optimal transport. Foundations and Trends in Machine Learning, 11(5-6):355–206, 2018.
  52. Beyond trivial counterfactual explanations with diverse valuable explanations. In Int. Conf in Computer Vision (ICCV), 2021.
  53. Learning models for actionable recourse. NeurIPS, 2021.
  54. T. Salimans and D. P. Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. CoRR, abs/1602.07868, 2016.
  55. Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2):336–359, oct 2019.
  56. Achieving robustness in classification using optimal transport with hinge regularization. In Conference on Computer Vision and Pattern Recognition (CVPR’21), 2021.
  57. Do input gradients highlight discriminative features? In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 2046–2059, 2021.
  58. Do input gradients highlight discriminative features? In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 2046–2059. Curran Associates, Inc., 2021.
  59. Deep inside convolutional networks: Visualising image classification models and saliency maps. In Y. Bengio and Y. LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Workshop Track Proceedings, 2014.
  60. When explanations lie: Why many modified bp attributions fail. In International Conference on Machine Learning, 2020.
  61. Smoothgrad: removing noise by adding noise, 2017.
  62. Robust large margin deep neural networks. IEEE Transactions on Signal Processing, 65(16):4265–4280, Aug. 2017.
  63. Visualizing the impact of feature attribution baselines. Distill, 2020.
  64. Axiomatic attribution for deep networks, 2017.
  65. Sanity checks for saliency metrics. In Association for the Advancement of Artificial Intelligence, 2019.
  66. A. Trockman and J. Z. Kolter. Orthogonalizing convolutional layers with the cayley transform. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
  67. Robustness may be at odds with accuracy, 2018.
  68. Counterfactual explanations for machine learning: A review. CoRR, abs/2010.10596, 2020.
  69. C. Villani. Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2008.
  70. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841, 2017.
  71. P. Wang and N. Vasconcelos. Scout: Self-aware discriminant counterfactual explanations. In The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  72. Robust models are more interpretable because attributions look normal, 2022.
  73. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. ArXiv, abs/1708.07747, 2017.
  74. On the (in)fidelity and sensitivity of explanations. In NeurIPS, 2019.
Citations (3)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 13 likes about this paper.