Understanding Multimodal Deep Neural Networks: A Concept Selection View (2404.08964v1)
Abstract: The multimodal deep neural networks, represented by CLIP, have generated rich downstream applications owing to their excellent performance, thus making understanding the decision-making process of CLIP an essential research topic. Due to the complex structure and the massive pre-training data, it is often regarded as a black-box model that is too difficult to understand and interpret. Concept-based models map the black-box visual representations extracted by deep neural networks onto a set of human-understandable concepts and use the concepts to make predictions, enhancing the transparency of the decision-making process. However, these methods involve the datasets labeled with fine-grained attributes by expert knowledge, which incur high costs and introduce excessive human prior knowledge and bias. In this paper, we observe the long-tail distribution of concepts, based on which we propose a two-stage Concept Selection Model (CSM) to mine core concepts without introducing any human priors. The concept greedy rough selection algorithm is applied to extract head concepts, and then the concept mask fine selection method performs the extraction of core concepts. Experiments show that our approach achieves comparable performance to end-to-end black-box models, and human evaluation demonstrates that the concepts discovered by our method are interpretable and comprehensible for humans.
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleMeaningfully debugging model mistakes using conceptual counterfactual explanations Meaningfully debugging model mistakes using conceptual counterfactual explanations.\BBCQ \BIn \APACrefbtitleInternational Conference on Machine Learning International conference on machine learning (\BPGS 66–88). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleA review of modularization techniques in artificial neural networks A review of modularization techniques in artificial neural networks.\BBCQ \APACjournalVolNumPagesArtificial Intelligence Review52527–561. \PrintBackRefs\CurrentBib
- \APACinsertmetastarcastelvecchi2016can{APACrefauthors}Castelvecchi, D. \APACrefYearMonthDay2016. \BBOQ\APACrefatitleCan we open the black box of AI? Can we open the black box of ai?\BBCQ \APACjournalVolNumPagesNature News538762320. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2018. \BBOQ\APACrefatitleGrad-cam++: Generalized gradient-based visual explanations for deep convolutional networks Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks.\BBCQ \BIn \APACrefbtitle2018 IEEE winter conference on applications of computer vision (WACV) 2018 ieee winter conference on applications of computer vision (wacv) (\BPGS 839–847). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleNeuron shapley: Discovering the responsible neurons Neuron shapley: Discovering the responsible neurons.\BBCQ \APACjournalVolNumPagesAdvances in neural information processing systems335922–5932. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2024. \APACrefbtitleConnecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleTraining data influence analysis and estimation: A survey Training data influence analysis and estimation: A survey.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2212.04612. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleA survey on vision transformer A survey on vision transformer.\BBCQ \APACjournalVolNumPagesIEEE transactions on pattern analysis and machine intelligence45187–110. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleA survey of transformer-based multimodal pre-trained modals A survey of transformer-based multimodal pre-trained modals.\BBCQ \APACjournalVolNumPagesNeurocomputing51589–106. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleFine-grained visual-textual representation learning Fine-grained visual-textual representation learning.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Circuits and Systems for Video Technology302520–531. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2018. \BBOQ\APACrefatitleGrounding visual explanations Grounding visual explanations.\BBCQ \BIn \APACrefbtitleProceedings of the European conference on computer vision (ECCV) Proceedings of the european conference on computer vision (eccv) (\BPGS 264–279). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleSeeing Health with Eyes: Feature Combination for Image-Based Human BMI Estimation Seeing health with eyes: Feature combination for image-based human bmi estimation.\BBCQ \BIn \APACrefbtitle2021 IEEE International Conference on Multimedia and Expo (ICME) 2021 ieee international conference on multimedia and expo (icme) (\BPG 1-6). {APACrefDOI} \doi10.1109/ICME51207.2021.9428234 \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleNow you see me (CME): concept-based model extraction Now you see me (cme): concept-based model extraction.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2010.13233. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2018. \BBOQ\APACrefatitleInterpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav).\BBCQ \BIn \APACrefbtitleInternational conference on machine learning International conference on machine learning (\BPGS 2668–2677). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleConcept bottleneck models Concept bottleneck models.\BBCQ \BIn \APACrefbtitleInternational conference on machine learning International conference on machine learning (\BPGS 5338–5348). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleMultitasking framework for unsupervised simple definition generation Multitasking framework for unsupervised simple definition generation.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2203.12926. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017. \BBOQ\APACrefatitleVisual genome: Connecting language and vision using crowdsourced dense image annotations Visual genome: Connecting language and vision using crowdsourced dense image annotations.\BBCQ \APACjournalVolNumPagesInternational journal of computer vision12332–73. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2009. \BBOQ\APACrefatitleLearning multiple layers of features from tiny images Learning multiple layers of features from tiny images.\BBCQ \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleFine-tuning can distort pretrained features and underperform out-of-distribution Fine-tuning can distort pretrained features and underperform out-of-distribution.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2202.10054. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2015. \BBOQ\APACrefatitleTiny imagenet visual recognition challenge Tiny imagenet visual recognition challenge.\BBCQ \APACjournalVolNumPagesCS 231N773. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2015. \BBOQ\APACrefatitleDeep learning Deep learning.\BBCQ \APACjournalVolNumPagesnature5217553436–444. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleMulti-level contrastive learning for script-based character understanding Multi-level contrastive learning for script-based character understanding.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2310.13231. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleInterpretability beyond classification output: Semantic bottleneck networks Interpretability beyond classification output: Semantic bottleneck networks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1907.10882. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017. \BBOQ\APACrefatitleA unified approach to interpreting model predictions A unified approach to interpreting model predictions.\BBCQ \APACjournalVolNumPagesAdvances in neural information processing systems30. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleGlancenets: Interpretable, leak-proof concept-based models Glancenets: Interpretable, leak-proof concept-based models.\BBCQ \APACjournalVolNumPagesAdvances in Neural Information Processing Systems3521212–21227. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleDo concept bottleneck models learn as intended? Do concept bottleneck models learn as intended?\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2105.04289. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleLabel-Free Concept Bottleneck Models Label-free concept bottleneck models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2304.06129. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleDeep learning for financial applications: A survey Deep learning for financial applications: A survey.\BBCQ \APACjournalVolNumPagesApplied Soft Computing93106384. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \APACrefbtitleConcept-based Explainable Artificial Intelligence: A Survey. Concept-based explainable artificial intelligence: A survey. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2021. \BBOQ\APACrefatitleLearning transferable visual models from natural language supervision Learning transferable visual models from natural language supervision.\BBCQ \BIn \APACrefbtitleInternational conference on machine learning International conference on machine learning (\BPGS 8748–8763). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleRobust speech recognition via large-scale weak supervision Robust speech recognition via large-scale weak supervision.\BBCQ \BIn \APACrefbtitleInternational Conference on Machine Learning International conference on machine learning (\BPGS 28492–28518). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleOverlooked factors in concept-based explanations: Dataset choice, concept salience, and human capability Overlooked factors in concept-based explanations: Dataset choice, concept salience, and human capability.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2207.09615. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleToward transparent ai: A survey on interpreting the inner structures of deep neural networks Toward transparent ai: A survey on interpreting the inner structures of deep neural networks.\BBCQ \BIn \APACrefbtitle2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) 2023 ieee conference on secure and trustworthy machine learning (satml) (\BPGS 464–483). \PrintBackRefs\CurrentBib
- \APACinsertmetastarschwalbe2022concept{APACrefauthors}Schwalbe, G. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleConcept embedding analysis: A review Concept embedding analysis: A review.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2203.13909. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017. \BBOQ\APACrefatitleOpening the black box of deep neural networks via information Opening the black box of deep neural networks via information.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1703.00810. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2017. \BBOQ\APACrefatitleConceptnet 5.5: An open multilingual graph of general knowledge Conceptnet 5.5: An open multilingual graph of general knowledge.\BBCQ \BIn \APACrefbtitleProceedings of the AAAI conference on artificial intelligence Proceedings of the aaai conference on artificial intelligence (\BVOL 31). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleLearning Bottleneck Concepts in Image Classification Learning bottleneck concepts in image classification.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Proceedings of the ieee/cvf conference on computer vision and pattern recognition (\BPGS 10962–10971). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleChain: Concept-harmonized hierarchical inference interpretation of deep convolutional neural networks Chain: Concept-harmonized hierarchical inference interpretation of deep convolutional neural networks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2002.01660. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2020. \BBOQ\APACrefatitleSupermasks in superposition Supermasks in superposition.\BBCQ \APACjournalVolNumPagesAdvances in Neural Information Processing Systems3315173–15184. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2023. \BBOQ\APACrefatitleLanguage in a bottle: Language model guided concept bottlenecks for interpretable image classification Language in a bottle: Language model guided concept bottlenecks for interpretable image classification.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Proceedings of the ieee/cvf conference on computer vision and pattern recognition (\BPGS 19187–19197). \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitlePost-hoc concept bottleneck models Post-hoc concept bottleneck models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2205.15480. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2022. \BBOQ\APACrefatitleFine-grained contrastive learning for definition generation Fine-grained contrastive learning for definition generation.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2210.00543. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2024. \APACrefbtitleA Question-centric Multi-experts Contrastive Learning Framework for Improving the Accuracy and Interpretability of Deep Sequential Knowledge Tracing Models. A question-centric multi-experts contrastive learning framework for improving the accuracy and interpretability of deep sequential knowledge tracing models. \PrintBackRefs\CurrentBib
- \APACrefYearMonthDay2019. \BBOQ\APACrefatitleA large-scale attribute dataset for zero-shot learning A large-scale attribute dataset for zero-shot learning.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops Proceedings of the ieee/cvf conference on computer vision and pattern recognition workshops (\BPGS 0–0). \PrintBackRefs\CurrentBib
- Chenming Shang (9 papers)
- Hengyuan Zhang (34 papers)
- Hao Wen (52 papers)
- Yujiu Yang (155 papers)