ProtoP-OD: Explainable Object Detection with Prototypical Parts (2402.19142v1)
Abstract: Interpretation and visualization of the behavior of detection transformers tends to highlight the locations in the image that the model attends to, but it provides limited insight into the \emph{semantics} that the model is focusing on. This paper introduces an extension to detection transformers that constructs prototypical local features and uses them in object detection. These custom features, which we call prototypical parts, are designed to be mutually exclusive and align with the classifications of the model. The proposed extension consists of a bottleneck module, the prototype neck, that computes a discretized representation of prototype activations and a new loss term that matches prototypes to object classes. This setup leads to interpretable representations in the prototype neck, allowing visual inspection of the image content perceived by the model and a better understanding of the model's reliability. We show experimentally that our method incurs only a limited performance penalty, and we provide examples that demonstrate the quality of the explanations provided by our method, which we argue outweighs the performance penalty.
- Quantifying attention flow in transformers. In Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J. (eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4190–4197. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.acl-main.385. URL https://aclanthology.org/2020.acl-main.385.
- Layer normalization, 2016. URL http://arxiv.org/abs/1607.06450.
- On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7):e0130140, 2015. ISSN 1932-6203. doi: 10.1371/journal.pone.0130140. URL https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140. Publisher: Public Library of Science.
- A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nature Machine Intelligence, 3(12):1061–1070, 2021. ISSN 2522-5839. doi: 10.1038/s42256-021-00423-x. URL https://www.nature.com/articles/s42256-021-00423-x. Number: 12 Publisher: Nature Publishing Group.
- Estimating or propagating gradients through stochastic neurons for conditional computation, 2013. URL http://arxiv.org/abs/1308.3432.
- End-to-end object detection with transformers. In Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.-M. (eds.), Computer Vision – ECCV 2020, Lecture Notes in Computer Science, pp. 213–229. Springer International Publishing, 2020. ISBN 978-3-030-58452-8. doi: 10.1007/978-3-030-58452-8_13.
- This looks like that: Deep learning for interpretable image recognition. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/hash/adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html.
- Deformable ProtoPNet: An interpretable image classifier using deformable prototypes. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10255–10265. IEEE, 2022. ISBN 978-1-66546-946-3. doi: 10.1109/CVPR52688.2022.01002. URL https://ieeexplore.ieee.org/document/9878975. ISSN: 2575-7075.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=YicbFdNTTy.
- Adaptation of grad-CAM method to neural network architecture for LiDAR pointcloud object detection. Energies, 15(13):4681, 2022. ISSN 1996-1073. doi: 10.3390/en15134681. URL https://www.mdpi.com/1996-1073/15/13/4681.
- Towards automatic concept-based explanations. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/hash/77d2afcb31f6493e350fca61764efb9a-Abstract.html.
- Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587. IEEE, 2014. ISBN 978-1-4799-5118-5. doi: 10.1109/CVPR.2014.81. URL https://ieeexplore.ieee.org/document/6909475. ISSN: 1063-6919.
- Explaining classifiers with causal concept effect (CaCE), 2020. URL http://arxiv.org/abs/1907.07165.
- Interpretable image recognition with hierarchical prototypes. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 7:32–40, 2019. ISSN 2769-1349. doi: 10.1609/hcomp.v7i1.5265. URL https://ojs.aaai.org/index.php/HCOMP/article/view/5265.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, 2016. ISBN 978-1-4673-8851-1. doi: 10.1109/CVPR.2016.90. URL https://ieeexplore.ieee.org/document/7780459. ISSN: 1063-6919.
- Mask r-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. IEEE, 2017. ISBN 978-1-5386-1032-9. doi: 10.1109/ICCV.2017.322. URL https://ieeexplore.ieee.org/document/8237584. ISSN: 2380-7504.
- Mask-GradCAM: Object identification and localization of visual presentation for deep convolutional network. In 2021 6th International Conference on Inventive Computation Technologies (ICICT), pp. 1171–1178, 2021. doi: 10.1109/ICICT50816.2021.9358569. URL https://ieeexplore.ieee.org/abstract/document/9358569.
- SHAP-based interpretable object detection method for satellite imagery. Remote Sensing, 14(9):1970, 2022. ISSN 2072-4292. doi: 10.3390/rs14091970. URL https://www.mdpi.com/2072-4292/14/9/1970. Number: 9 Publisher: Multidisciplinary Digital Publishing Institute.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Proceedings of the 35th International Conference on Machine Learning, pp. 2668–2677. PMLR, 2018. URL https://proceedings.mlr.press/v80/kim18d.html. ISSN: 2640-3498.
- Explaining YOLO: Leveraging grad-CAM to explain object detections. arXiv preprint arXiv:2211.12108, 2022. doi: 10.3217/978-3-85125-869-1-13. URL http://arxiv.org/abs/2211.12108.
- Segment anything, 2023. URL http://arxiv.org/abs/2304.02643.
- Concept bottleneck models. In Proceedings of the 37th International Conference on Machine Learning, pp. 5338–5348. PMLR, 2020. URL https://proceedings.mlr.press/v119/koh20a.html. ISSN: 2640-3498.
- Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018. ISSN 2374-3468. doi: 10.1609/aaai.v32i1.11771. URL https://ojs.aaai.org/index.php/AAAI/article/view/11771. Number: 1.
- Microsoft COCO: Common objects in context. In Fleet, D., Pajdla, T., Schiele, B., and Tuytelaars, T. (eds.), Computer Vision – ECCV 2014, Lecture Notes in Computer Science, pp. 740–755. Springer International Publishing, 2014. ISBN 978-3-319-10602-1. doi: 10.1007/978-3-319-10602-1_48.
- A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.
- From softmax to sparsemax: A sparse model of attention and multi-label classification. In Proceedings of The 33rd International Conference on Machine Learning, pp. 1614–1623. PMLR, 2016. URL https://proceedings.mlr.press/v48/martins16.html. ISSN: 1938-7228.
- This looks like that, because … explaining prototypes for interpretable image recognition. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Communications in Computer and Information Science, pp. 441–456. Springer International Publishing, 2021. ISBN 978-3-030-93736-2. doi: 10.1007/978-3-030-93736-2_34.
- Black-box explanation of object detectors via saliency maps. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11438–11447. IEEE, 2021. ISBN 978-1-66544-509-2. doi: 10.1109/CVPR46437.2021.01128. URL https://ieeexplore.ieee.org/document/9578529. ISSN: 2575-7075.
- You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788. IEEE, 2016. ISBN 978-1-4673-8851-1. doi: 10.1109/CVPR.2016.91. URL https://ieeexplore.ieee.org/document/7780460. ISSN: 1063-6919.
- Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1137–1149, 2017. ISSN 1939-3539. doi: 10.1109/TPAMI.2016.2577031. URL https://ieeexplore.ieee.org/document/7485869. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.
- ProtoPShare: Prototypical parts sharing for similarity discovery in interpretable image classification. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, pp. 1420–1430. Association for Computing Machinery, 2021. ISBN 978-1-4503-8332-5. doi: 10.1145/3447548.3467245. URL https://dl.acm.org/doi/10.1145/3447548.3467245.
- ProtoMIL: Multiple instance learning with prototypical parts for whole-slide image classification. In Amini, M.-R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., and Tsoumakas, G. (eds.), Machine Learning and Knowledge Discovery in Databases, Lecture Notes in Computer Science, pp. 421–436. Springer International Publishing, 2023. ISBN 978-3-031-26387-3. doi: 10.1007/978-3-031-26387-3_26.
- ProtoSeg: Interpretable semantic segmentation with prototypical parts. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1481–1492. IEEE, 2023. ISBN 978-1-66549-346-8. doi: 10.1109/WACV56688.2023.00153. URL https://ieeexplore.ieee.org/document/10030923. ISSN: 2642-9381.
- Restricting the flow: Information bottlenecks for attribution. In International Conference on Learning Representations, 2020. URL https://iclr.cc/virtual_2020/poster_S1xWh1rYwB.html.
- Grad-CAM: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 618–626. IEEE, 2017. ISBN 978-1-5386-1032-9. doi: 10.1109/ICCV.2017.74. URL https://ieeexplore.ieee.org/document/8237336. ISSN: 2380-7504.
- Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, pp. 3319–3328. PMLR, 2017. URL https://proceedings.mlr.press/v70/sundararajan17a.html. ISSN: 2640-3498.
- The information bottleneck method, 2000. URL http://arxiv.org/abs/physics/0004057.
- Neural discrete representation learning. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/hash/7a98af17e63a0ac09ce2e96d03992fbc-Abstract.html.
- Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- Vision transformer with deformable attention. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4784–4793. IEEE, 2022. ISBN 978-1-66546-946-3. doi: 10.1109/CVPR52688.2022.00475. URL https://ieeexplore.ieee.org/document/9878689. ISSN: 2575-7075.
- Understanding and improving layer normalization. In Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F. d., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/2f4fe03d77724a7217006e5d16728874-Paper.pdf.
- Spatial sensitive GRAD-CAM: Visual explanations for object detection by incorporating spatial sensitivity. In 2022 IEEE International Conference on Image Processing (ICIP), pp. 256–260, 2022. doi: 10.1109/ICIP46576.2022.9897350. URL https://ieeexplore.ieee.org/abstract/document/9897350. ISSN: 2381-8549.
- DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection. In The Eleventh International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=3mRwyG5one.
- Deformable DETR: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=gZ9hCDWe6ke.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.