Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Use Perturbations when Learning from Explanations (2303.06419v3)

Published 11 Mar 2023 in cs.LG

Abstract: Machine learning from explanations (MLX) is an approach to learning that uses human-provided explanations of relevant or irrelevant features for each input to ensure that model predictions are right for the right reasons. Existing MLX approaches rely on local model interpretation methods and require strong model smoothing to align model and human explanations, leading to sub-optimal performance. We recast MLX as a robustness problem, where human explanations specify a lower dimensional manifold from which perturbations can be drawn, and show both theoretically and empirically how this approach alleviates the need for strong model smoothing. We consider various approaches to achieving robustness, leading to improved performance over prior MLX methods. Finally, we show how to combine robustness with an earlier MLX method, yielding state-of-the-art results on both synthetic and real-world benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Individual fairness guarantees for neural networks. arXiv preprint arXiv:2205.05763, 2022.
  2. Wild patterns: Ten years after the rise of adversarial machine learning. Pattern Recognition, 84:317–331, 2018.
  3. What is the effect of importance weighting in deep learning? In International conference on machine learning, pp. 872–881. PMLR, 2019.
  4. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368, 2019.
  5. contributors, W. Linearity of differentiation. Wikipedia, 10 2022. URL https://en.wikipedia.org/wiki/Linearity_of_differentiation.
  6. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  9268–9277, 2019.
  7. Ai for radiographic covid-19 detection selects shortcuts over signal. Nature Machine Intelligence, 3(7):610–619, 2021.
  8. Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research, 2020.
  9. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
  10. On the effectiveness of interval bound propagation for training verifiably robust models. arXiv preprint arXiv:1810.12715, 2018.
  11. Probabilistic Numerics: Computation as Machine Learning. Cambridge University Press, 2022. doi: 10.1017/9781316681411.
  12. On feature learning in the presence of spurious correlations. Advances in Neural Information Processing Systems, 35:38516–38532, 2022.
  13. Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937, 2022.
  14. Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning, pp. 6781–6792. PMLR, 2021.
  15. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  16. Prior knowledge-guided attention in self-supervised vision transformers. arXiv preprint arXiv:2209.03745, 2022.
  17. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, pp. 3578–3586. PMLR, 2018.
  18. Murphy, K. P. Probabilistic Machine Learning: Advanced Topics. MIT Press, 2023. URL http://probml.github.io/book2.
  19. Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. In International conference on machine learning, pp. 8116–8126. PMLR, 2020.
  20. Right for the right reasons: Training differentiable models by constraining their explanations. arXiv preprint arXiv:1703.03717, 2017.
  21. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019.
  22. Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nature Machine Intelligence, 2(8):476–486, 2020.
  23. Taking a hint: Leveraging explanations to make vision and language models more grounded. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  2591–2600, 2019.
  24. The pitfalls of simplicity bias in neural networks. Advances in Neural Information Processing Systems, 33:9573–9585, 2020.
  25. Goal misgeneralization: Why correct specifications aren’t enough for correct goals. arXiv preprint arXiv:2210.01790, 2022.
  26. Right for better reasons: Training differentiable models by constraining their influence functions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  9533–9540, 2021.
  27. Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2):227–244, 2000.
  28. Hierarchical interpretations for neural network predictions. arXiv preprint arXiv:1806.05337, 2018.
  29. Salient imagenet: How to discover spurious features in deep learning? arXiv preprint arXiv:2110.04301, 2021.
  30. Core risk minimization using salient imagenet. arXiv preprint arXiv:2203.15566, 2022.
  31. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
  32. Right for the right concept: Revising neuro-symbolic concepts by interacting with their explanations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3619–3629, 2021.
  33. Axiomatic attribution for deep networks. In International conference on machine learning, pp. 3319–3328. PMLR, 2017.
  34. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  35. On adaptive attacks to adversarial example defenses. Advances in Neural Information Processing Systems, 33:1633–1645, 2020.
  36. The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data, 5(1):1–9, 2018.
  37. Bayesian inference with certifiable adversarial robustness. In International Conference on Artificial Intelligence and Statistics, pp.  2431–2439. PMLR, 2021.
  38. Robust explanation constraints for neural networks. arXiv preprint arXiv:2212.08507, 2022.
  39. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS medicine, 15(11):e1002683, 2018.
  40. Towards stable and efficient training of verifiably robust neural networks. arXiv preprint arXiv:1906.06316, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Juyeon Heo (11 papers)
  2. Vihari Piratla (16 papers)
  3. Matthew Wicker (24 papers)
  4. Adrian Weller (150 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.