Deep spatial context: when attention-based models meet spatial regression (2401.10044v2)
Abstract: We propose 'Deep spatial context' (DSCon) method, which serves for investigation of the attention-based vision models using the concept of spatial context. It was inspired by histopathologists, however, the method can be applied to various domains. The DSCon allows for a quantitative measure of the spatial context's role using three Spatial Context Measures: $SCM_{features}$, $SCM_{targets}$, $SCM_{residuals}$ to distinguish whether the spatial context is observable within the features of neighboring regions, their target values (attention scores) or residuals, respectively. It is achieved by integrating spatial regression into the pipeline. The DSCon helps to verify research questions. The experiments reveal that spatial relationships are much bigger in the case of the classification of tumor lesions than normal tissues. Moreover, it turns out that the larger the size of the neighborhood taken into account within spatial regression, the less valuable contextual information is. Furthermore, it is observed that the spatial context measure is the largest when considered within the feature space as opposed to the targets and residuals.
- A survey of methods for explaining black box models. ACM Computing Surveys (CSUR), 51:1 – 42, 2018.
- Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions, 2023.
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017. doi:10.1109/ICCV.2017.74.
- Axiomatic attribution for deep networks. In International Conference on Machine Learning (ICML), 2017.
- Focused Transformer: Contrastive Training for Context Scaling, 2023.
- Determinants of widespread metastases and of metastatic tropism in patients with prostate cancer: A genomic analysis of primary and metastatic tumors. International Journal of Radiation Oncology*Biology*Physics, 117(2, Supplement):e375–e376, 2023. ISSN 0360-3016. doi:https://doi.org/10.1016/j.ijrobp.2023.06.2481. ASTRO 2023: 65th Annual Meeting.
- Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering, 5(6):555–570, 2021.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312.6034, 2013.
- "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pages 1135–1144, 2016.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International Conference on Machine Learning (ICML), 2017.
- From attribution maps to human-understandable explanations through concept relevance propagation. Nature Machine Intelligence, 5(9):1006–1019, Sep 2023. doi:10.1038/s42256-023-00711-8.
- Explain any concept: Segment anything meets concept-based explanation. 2023.
- This Looks like That: Deep Learning for Interpretable Image Recognition. Curran Associates Inc., Red Hook, NY, USA, 2019.
- ProtoMIL: Multiple Instance Learning with Prototypical Parts for Whole-Slide Image Classification. In Machine Learning and Knowledge Discovery in Databases, pages 421–436, Cham, 2023. Springer International Publishing. ISBN 978-3-031-26387-3.
- Beyond Classification: Whole Slide Tissue Histopathology Analysis By End-To-End Part Learning. In Medical Imaging with Deep Learning, 2020.
- Attention is not explanation. In North American Chapter of the Association for Computational Linguistics, 2019.
- Learning visual explanations for dcnn-based image classifiers using an attention mechanism. In Leonid Karlinsky, Tomer Michaeli, and Ko Nishino, editors, Computer Vision – ECCV 2022 Workshops, pages 396–411, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-25085-9.
- Tame: Attention mechanism based feature fusion for generating explanation maps of convolutional neural networks. In 2022 IEEE International Symposium on Multimedia (ISM), pages 58–65, Los Alamitos, CA, USA, dec 2022. IEEE Computer Society. doi:10.1109/ISM55400.2022.00014.
- Attention: Marginal Probability is All You Need?, 2023.
- Understanding Self-attention Mechanism via Dynamical System Perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1412–1422, 2023.
- An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models, 2022.
- A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), pages 4765–4774. Curran Associates, Inc., 2017.
- Luc Anselin. Spatial Econometrics: Methods and Models. Studies in Operational Regional Science. Springer Dordrecht, 1988. doi:https://doi.org/10.1007/978-94-015-7799-1.
- A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model. International Economic Review, 40(2):509–33, 1999.
- A Generalized Spatial Two-Stage Least Squares Procedure for Estimating A Spatial Autoregressive Model with Autoregressive Disturbances. The Journal of Real Estate Finance and Economics, 17:99–121, 02 1998. doi:10.1023/A:1007707430416.
- Patrick A. P. Moran. The interpretation of statistical maps. Journal of the Royal Statistical Society, page 243–251, 1948.
- Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA, 318(22):2199–2210, 12 2017. ISSN 0098-7484. doi:10.1001/jama.2017.14585. URL https://doi.org/10.1001/jama.2017.14585.
- Performance is not enough: a story of the Rashomon’s quartet, 2023.
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, 2021.
- Swin Transformer V2: Scaling Up Capacity and Resolution. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11999–12009, 2022. doi:10.1109/CVPR52688.2022.01170.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations (ICLR), 2021.
- Visual Prompt Tuning. In European Conference on Computer Vision (ECCV), 2022.
- Deep learning for time series classification: a review. Data Mining and Knowledge Discovery, 33(4):917–963, 2019.
- Luc Anselin. Local indicators of spatial association—lisa. Geographical Analysis, 27(2):93–115, April 1995. ISSN 0016-7363. doi:10.1111/j.1538-4632.1995.tb00338.x.