Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement (2311.15303v1)
Abstract: Humans use abstract concepts for understanding instead of hard features. Recent interpretability research has focused on human-centered concept explanations of neural networks. Concept Activation Vectors (CAVs) estimate a model's sensitivity and possible biases to a given concept. In this paper, we extend CAVs from post-hoc analysis to ante-hoc training in order to reduce model bias through fine-tuning using an additional Concept Loss. Concepts were defined on the final layer of the network in the past. We generalize it to intermediate layers using class prototypes. This facilitates class learning in the last convolution layer, which is known to be most informative. We also introduce Concept Distillation to create richer concepts using a pre-trained knowledgeable model as the teacher. Our method can sensitize or desensitize a model towards concepts. We show applications of concept-sensitive training to debias several classification problems. We also use concepts to induce prior knowledge into IID, a reconstruction problem. Concept-sensitive training can improve model interpretability, reduce biases, and induce prior knowledge. Please visit https://avani17101.github.io/Concept-Distilllation/ for code and more details.
- Grad-cam++: improved visual explanations for deep convolutional networks. arxiv 2018. arXiv preprint arXiv:1710.11063, 2018.
- N. Akhtar. A survey of explainable ai in deep visual modeling: Methods and metrics. ArXiv, abs/2301.13445, 2023.
- Cocox: Generating conceptual and counterfactual explanations via fault-lines. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 2594–2601, 2020.
- D. Alvarez Melis and T. Jaakkola. Towards robust interpretability with self-explaining neural networks. Adv. Neural Inform. Process. Syst., 2018.
- Finding and removing clever hans: Using explanation methods to debug and improve deep models. Information Fusion, 77:261–295, 2022.
- Debiasing concept-based explanations with causal analysis. arXiv preprint arXiv:2007.11500, 2020.
- Recovering intrinsic scene characteristics from images. 1978.
- Sok: Harnessing prior knowledge for explainable machine learning: An overview. In First IEEE Conference on Secure and Trustworthy Machine Learning, 2023.
- Intrinsic images in the wild. ACM Transactions on Graphics (TOG), 33:1 – 12, 2014a.
- Intrinsic images in the wild. ACM Transactions on Graphics (TOG), 33(4):1–12, 2014b.
- Intrinsic Decompositions for Image Editing. Computer Graphics Forum (Eurographics State of The Art Report), 2017.
- Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV), pages 132–149, 2018.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
- D. T. Chang. Concept-oriented deep learning: Generative concept representations. arXiv preprint arXiv:1811.06622, 2018.
- Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
- Techniques for interpretable machine learning. Communications of the ACM, 63(1):68–77, 2019.
- Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nature machine intelligence, 3(7):620–631, 2021.
- R. C. Fong and A. Vedaldi. Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE international conference on computer vision, pages 3429–3437, 2017.
- Going beyond xai: A systematic survey for explanation-guided learning. arXiv preprint arXiv:2212.03954, 2022.
- A survey on intrinsic images: Delving deep into lambert and beyond. Int. J. Comput. Vision, 130(3):836–868, 2022.
- Dissect: Disentangled simultaneous explanations via concept traversals. arXiv preprint arXiv:2105.15164, 2021.
- Towards automatic concept-based explanations. arXiv preprint arXiv:1902.03129, 2019.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Interpreting intrinsic image decomposition using concept activations. In Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP ’22, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9781450398220. doi: 10.1145/3571600.3571603.
- P. Hase and M. Bansal. When can models learn from explanations? a formal framework for understanding the roles of explanation data. arXiv preprint arXiv:2102.02201, 2021.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006.
- P. Hitzler and M. Sarker. Human-centered concept explanations for neural networks. Neuro-Symbolic Artificial Intelligence: The State of the Art, 342(337):2, 2022.
- Mapping knowledge representations to concepts: A review and new perspectives. ArXiv, abs/2301.00189, 2022.
- Distilling model failures as directions in latent space. arXiv preprint arXiv:2206.14754, 2022.
- Now you see me (cme): concept-based model extraction. arXiv preprint arXiv:2010.13233, 2020.
- Proto2proto: Can you recognize the car, the way i do? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10233–10243, 2022.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018.
- Xprotonet: Diagnosis in chest radiography with global and local explanations. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15714–15723, 2021a.
- Biaswap: Removing dataset bias with bias-tailored swapping augmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14992–15001, 2021b.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Concept bottleneck models. In International Conference on Machine Learning, pages 5338–5348. PMLR, 2020.
- Shading annotations in the wild. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 850–859, 2017a.
- Shading annotations in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6998–7007, 2017b.
- M. L EH. Lightness and retinex theory. J. Opt. Soc. Am., 61(1):1–11, 1971.
- E. H. Land. The retinex theory of color vision. Scientific American, 237 6:108–28, 1977.
- Y. LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
- Learning debiased representation via disentangled feature augmentation. Advances in Neural Information Processing Systems, 34:25123–25133, 2021.
- Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966, 2020.
- Y. Li and N. Vasconcelos. Repair: Removing representation bias by dataset resampling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9572–9581, 2019.
- Pseudo labels for unsupervised domain adaptation: A review. Electronics, 12(15):3325, 2023.
- Z. Li and N. Snavely. Cgintrinsics: Better intrinsic image decomposition through physically-based rendering. In European Conference on Computer Vision (ECCV), 2018.
- Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2020.
- Intrinsic image decomposition: A comprehensive review. In Image and Graphics: 9th International Conference, ICIG 2017, Shanghai, China, September 13-15, 2017, Revised Selected Papers, Part I 9, pages 626–638. Springer, 2017.
- Text-to-concept (and back) via cross-model alignment. arXiv preprint arXiv:2305.06386, 2023.
- Protocon: Pseudo-label refinement via online clustering and prototypical consistency for efficient semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11641–11650, 2023.
- Spice: Semantic pseudo-labeling for image clustering. IEEE Transactions on Image Processing, 31:7264–7278, 2022.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. In International conference on machine learning, pages 8116–8126. PMLR, 2020.
- Right for the right reasons: Training differentiable models by constraining their explanations. arXiv preprint arXiv:1703.03717, 2017.
- G. Saha and K. Roy. Saliency guided experience packing for replay in continual learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5273–5283, 2023.
- Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296, 2017.
- Y. Sawada and K. Nakamura. C-senn: Contrastive self-explaining neural network. ArXiv, abs/2206.09575, 2022a.
- Y. Sawada and K. Nakamura. Concept bottleneck model with additional unsupervised concepts. IEEE Access, 10:41758–41765, 2022b.
- Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nature Machine Intelligence, 2(8):476–486, 2020.
- Best of both worlds: local and global explanations with human-understandable concepts. arXiv preprint arXiv:2106.08641, 2021.
- G. Schwalbe. Concept embedding analysis: A review. arXiv preprint arXiv:2203.13909, 2022.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
- Img2tab: Automatic class relevant concept discovery from stylegan features for explainable image classification. arXiv preprint arXiv:2301.06324, 2023.
- Adversarial tcav - robust and effective interpretation of intermediate layers in neural networks. ArXiv, abs/2002.03549, 2020.
- A prototype-oriented framework for unsupervised domain adaptation. Advances in Neural Information Processing Systems, 34:17194–17208, 2021.
- End: Entangling and disentangling deep representations for bias correction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13508–13517, 2021.
- Neural feature fusion fields: 3d distillation of self-supervised 2d image representations. arXiv preprint arXiv:2209.03494, 2022a.
- Neural feature fusion fields: 3D distillation of self-supervised 2D image representations. In Proceedings of the International Conference on 3D Vision (3DV), 2022b.
- S. Vojíř and T. Kliegr. Editable machine learning models? a rule-based framework for user studies of explainability. Advances in Data Analysis and Classification, 14(4):785–799, 2020.
- M. Wang and W. Deng. Deep visual domain adaptation: A survey. Neurocomputing, 312:135–153, 2018.
- Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
- Dino-mc: Self-supervised contrastive learning for remote sensing imagery with multi-sized local crops. arXiv preprint arXiv:2303.06670, 2023.
- Beyond explaining: Opportunities and challenges of xai-based model improvement. Information Fusion, 2022.
- Beyond sparsity: Tree regularization of deep models for interpretability. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence, 41(9):2251–2265, 2018.
- Protopformer: Concentrating on prototypical parts in vision transformers for interpretable image recognition. arXiv preprint arXiv:2208.10431, 2022.
- A small-sample text classification model based on pseudo-label fusion clustering algorithm. Applied Sciences, 13(8):4716, 2023.
- Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11682–11690, 2021a.
- A survey on neural network interpretability. IEEE Transactions on Emerging Topics in Computational Intelligence, 5(5):726–742, 2021b.
- Avani Gupta (5 papers)
- Saurabh Saini (13 papers)
- P J Narayanan (8 papers)