Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding Multimodal Deep Neural Networks: A Concept Selection View (2404.08964v1)

Published 13 Apr 2024 in cs.CV, cs.AI, and cs.LG

Abstract: The multimodal deep neural networks, represented by CLIP, have generated rich downstream applications owing to their excellent performance, thus making understanding the decision-making process of CLIP an essential research topic. Due to the complex structure and the massive pre-training data, it is often regarded as a black-box model that is too difficult to understand and interpret. Concept-based models map the black-box visual representations extracted by deep neural networks onto a set of human-understandable concepts and use the concepts to make predictions, enhancing the transparency of the decision-making process. However, these methods involve the datasets labeled with fine-grained attributes by expert knowledge, which incur high costs and introduce excessive human prior knowledge and bias. In this paper, we observe the long-tail distribution of concepts, based on which we propose a two-stage Concept Selection Model (CSM) to mine core concepts without introducing any human priors. The concept greedy rough selection algorithm is applied to extract head concepts, and then the concept mask fine selection method performs the extraction of core concepts. Experiments show that our approach achieves comparable performance to end-to-end black-box models, and human evaluation demonstrates that the concepts discovered by our method are interpretable and comprehensible for humans.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleMeaningfully debugging model mistakes using conceptual counterfactual explanations Meaningfully debugging model mistakes using conceptual counterfactual explanations.\BBCQ \BIn \APACrefbtitleInternational Conference on Machine Learning International conference on machine learning (\BPGS 66–88). \PrintBackRefs\CurrentBib
  2. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleA review of modularization techniques in artificial neural networks A review of modularization techniques in artificial neural networks.\BBCQ \APACjournalVolNumPagesArtificial Intelligence Review52527–561. \PrintBackRefs\CurrentBib
  3. \APACinsertmetastarcastelvecchi2016can{APACrefauthors}Castelvecchi, D.  \APACrefYearMonthDay2016. \BBOQ\APACrefatitleCan we open the black box of AI? Can we open the black box of ai?\BBCQ \APACjournalVolNumPagesNature News538762320. \PrintBackRefs\CurrentBib
  4. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleGrad-cam++: Generalized gradient-based visual explanations for deep convolutional networks Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks.\BBCQ \BIn \APACrefbtitle2018 IEEE winter conference on applications of computer vision (WACV) 2018 ieee winter conference on applications of computer vision (wacv) (\BPGS 839–847). \PrintBackRefs\CurrentBib
  5. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleNeuron shapley: Discovering the responsible neurons Neuron shapley: Discovering the responsible neurons.\BBCQ \APACjournalVolNumPagesAdvances in neural information processing systems335922–5932. \PrintBackRefs\CurrentBib
  6. \APACrefYearMonthDay2024. \APACrefbtitleConnecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. \PrintBackRefs\CurrentBib
  7. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleTraining data influence analysis and estimation: A survey Training data influence analysis and estimation: A survey.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2212.04612. \PrintBackRefs\CurrentBib
  8. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleA survey on vision transformer A survey on vision transformer.\BBCQ \APACjournalVolNumPagesIEEE transactions on pattern analysis and machine intelligence45187–110. \PrintBackRefs\CurrentBib
  9. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleA survey of transformer-based multimodal pre-trained modals A survey of transformer-based multimodal pre-trained modals.\BBCQ \APACjournalVolNumPagesNeurocomputing51589–106. \PrintBackRefs\CurrentBib
  10. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleFine-grained visual-textual representation learning Fine-grained visual-textual representation learning.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Circuits and Systems for Video Technology302520–531. \PrintBackRefs\CurrentBib
  11. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleGrounding visual explanations Grounding visual explanations.\BBCQ \BIn \APACrefbtitleProceedings of the European conference on computer vision (ECCV) Proceedings of the european conference on computer vision (eccv) (\BPGS 264–279). \PrintBackRefs\CurrentBib
  12. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleSeeing Health with Eyes: Feature Combination for Image-Based Human BMI Estimation Seeing health with eyes: Feature combination for image-based human bmi estimation.\BBCQ \BIn \APACrefbtitle2021 IEEE International Conference on Multimedia and Expo (ICME) 2021 ieee international conference on multimedia and expo (icme) (\BPG 1-6). {APACrefDOI} \doi10.1109/ICME51207.2021.9428234 \PrintBackRefs\CurrentBib
  13. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleNow you see me (CME): concept-based model extraction Now you see me (cme): concept-based model extraction.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2010.13233. \PrintBackRefs\CurrentBib
  14. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleInterpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav).\BBCQ \BIn \APACrefbtitleInternational conference on machine learning International conference on machine learning (\BPGS 2668–2677). \PrintBackRefs\CurrentBib
  15. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleConcept bottleneck models Concept bottleneck models.\BBCQ \BIn \APACrefbtitleInternational conference on machine learning International conference on machine learning (\BPGS 5338–5348). \PrintBackRefs\CurrentBib
  16. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleMultitasking framework for unsupervised simple definition generation Multitasking framework for unsupervised simple definition generation.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2203.12926. \PrintBackRefs\CurrentBib
  17. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleVisual genome: Connecting language and vision using crowdsourced dense image annotations Visual genome: Connecting language and vision using crowdsourced dense image annotations.\BBCQ \APACjournalVolNumPagesInternational journal of computer vision12332–73. \PrintBackRefs\CurrentBib
  18. \APACrefYearMonthDay2009. \BBOQ\APACrefatitleLearning multiple layers of features from tiny images Learning multiple layers of features from tiny images.\BBCQ \PrintBackRefs\CurrentBib
  19. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleFine-tuning can distort pretrained features and underperform out-of-distribution Fine-tuning can distort pretrained features and underperform out-of-distribution.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2202.10054. \PrintBackRefs\CurrentBib
  20. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleTiny imagenet visual recognition challenge Tiny imagenet visual recognition challenge.\BBCQ \APACjournalVolNumPagesCS 231N773. \PrintBackRefs\CurrentBib
  21. \APACrefYearMonthDay2015. \BBOQ\APACrefatitleDeep learning Deep learning.\BBCQ \APACjournalVolNumPagesnature5217553436–444. \PrintBackRefs\CurrentBib
  22. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleMulti-level contrastive learning for script-based character understanding Multi-level contrastive learning for script-based character understanding.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2310.13231. \PrintBackRefs\CurrentBib
  23. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleInterpretability beyond classification output: Semantic bottleneck networks Interpretability beyond classification output: Semantic bottleneck networks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1907.10882. \PrintBackRefs\CurrentBib
  24. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleA unified approach to interpreting model predictions A unified approach to interpreting model predictions.\BBCQ \APACjournalVolNumPagesAdvances in neural information processing systems30. \PrintBackRefs\CurrentBib
  25. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleGlancenets: Interpretable, leak-proof concept-based models Glancenets: Interpretable, leak-proof concept-based models.\BBCQ \APACjournalVolNumPagesAdvances in Neural Information Processing Systems3521212–21227. \PrintBackRefs\CurrentBib
  26. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleDo concept bottleneck models learn as intended? Do concept bottleneck models learn as intended?\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2105.04289. \PrintBackRefs\CurrentBib
  27. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleLabel-Free Concept Bottleneck Models Label-free concept bottleneck models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2304.06129. \PrintBackRefs\CurrentBib
  28. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleDeep learning for financial applications: A survey Deep learning for financial applications: A survey.\BBCQ \APACjournalVolNumPagesApplied Soft Computing93106384. \PrintBackRefs\CurrentBib
  29. \APACrefYearMonthDay2023. \APACrefbtitleConcept-based Explainable Artificial Intelligence: A Survey. Concept-based explainable artificial intelligence: A survey. \PrintBackRefs\CurrentBib
  30. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleLearning transferable visual models from natural language supervision Learning transferable visual models from natural language supervision.\BBCQ \BIn \APACrefbtitleInternational conference on machine learning International conference on machine learning (\BPGS 8748–8763). \PrintBackRefs\CurrentBib
  31. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleRobust speech recognition via large-scale weak supervision Robust speech recognition via large-scale weak supervision.\BBCQ \BIn \APACrefbtitleInternational Conference on Machine Learning International conference on machine learning (\BPGS 28492–28518). \PrintBackRefs\CurrentBib
  32. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleOverlooked factors in concept-based explanations: Dataset choice, concept salience, and human capability Overlooked factors in concept-based explanations: Dataset choice, concept salience, and human capability.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2207.09615. \PrintBackRefs\CurrentBib
  33. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleToward transparent ai: A survey on interpreting the inner structures of deep neural networks Toward transparent ai: A survey on interpreting the inner structures of deep neural networks.\BBCQ \BIn \APACrefbtitle2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) 2023 ieee conference on secure and trustworthy machine learning (satml) (\BPGS 464–483). \PrintBackRefs\CurrentBib
  34. \APACinsertmetastarschwalbe2022concept{APACrefauthors}Schwalbe, G.  \APACrefYearMonthDay2022. \BBOQ\APACrefatitleConcept embedding analysis: A review Concept embedding analysis: A review.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2203.13909. \PrintBackRefs\CurrentBib
  35. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleOpening the black box of deep neural networks via information Opening the black box of deep neural networks via information.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1703.00810. \PrintBackRefs\CurrentBib
  36. \APACrefYearMonthDay2017. \BBOQ\APACrefatitleConceptnet 5.5: An open multilingual graph of general knowledge Conceptnet 5.5: An open multilingual graph of general knowledge.\BBCQ \BIn \APACrefbtitleProceedings of the AAAI conference on artificial intelligence Proceedings of the aaai conference on artificial intelligence (\BVOL 31). \PrintBackRefs\CurrentBib
  37. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleLearning Bottleneck Concepts in Image Classification Learning bottleneck concepts in image classification.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Proceedings of the ieee/cvf conference on computer vision and pattern recognition (\BPGS 10962–10971). \PrintBackRefs\CurrentBib
  38. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleChain: Concept-harmonized hierarchical inference interpretation of deep convolutional neural networks Chain: Concept-harmonized hierarchical inference interpretation of deep convolutional neural networks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2002.01660. \PrintBackRefs\CurrentBib
  39. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleSupermasks in superposition Supermasks in superposition.\BBCQ \APACjournalVolNumPagesAdvances in Neural Information Processing Systems3315173–15184. \PrintBackRefs\CurrentBib
  40. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleLanguage in a bottle: Language model guided concept bottlenecks for interpretable image classification Language in a bottle: Language model guided concept bottlenecks for interpretable image classification.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Proceedings of the ieee/cvf conference on computer vision and pattern recognition (\BPGS 19187–19197). \PrintBackRefs\CurrentBib
  41. \APACrefYearMonthDay2022. \BBOQ\APACrefatitlePost-hoc concept bottleneck models Post-hoc concept bottleneck models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2205.15480. \PrintBackRefs\CurrentBib
  42. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleFine-grained contrastive learning for definition generation Fine-grained contrastive learning for definition generation.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2210.00543. \PrintBackRefs\CurrentBib
  43. \APACrefYearMonthDay2024. \APACrefbtitleA Question-centric Multi-experts Contrastive Learning Framework for Improving the Accuracy and Interpretability of Deep Sequential Knowledge Tracing Models. A question-centric multi-experts contrastive learning framework for improving the accuracy and interpretability of deep sequential knowledge tracing models. \PrintBackRefs\CurrentBib
  44. \APACrefYearMonthDay2019. \BBOQ\APACrefatitleA large-scale attribute dataset for zero-shot learning A large-scale attribute dataset for zero-shot learning.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops Proceedings of the ieee/cvf conference on computer vision and pattern recognition workshops (\BPGS 0–0). \PrintBackRefs\CurrentBib
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Chenming Shang (9 papers)
  2. Hengyuan Zhang (34 papers)
  3. Hao Wen (52 papers)
  4. Yujiu Yang (155 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com