Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet (1904.00760v1)

Published 20 Mar 2019 in cs.CV, cs.LG, and stat.ML

Abstract: Deep Neural Networks (DNNs) excel on many complex perceptual tasks but it has proven notoriously difficult to understand how they reach their decisions. We here introduce a high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain. Our model, a simple variant of the ResNet-50 architecture called BagNet, classifies an image based on the occurrences of small local image features without taking into account their spatial ordering. This strategy is closely related to the bag-of-feature (BoF) models popular before the onset of deep learning and reaches a surprisingly high accuracy on ImageNet (87.6% top-5 for 33 x 33 px features and Alexnet performance for 17 x 17 px features). The constraint on local features makes it straight-forward to analyse how exactly each part of the image influences the classification. Furthermore, the BagNets behave similar to state-of-the art deep neural networks such as VGG-16, ResNet-152 or DenseNet-169 in terms of feature sensitivity, error distribution and interactions between image parts. This suggests that the improvements of DNNs over previous bag-of-feature classifiers in the last few years is mostly achieved by better fine-tuning rather than by qualitatively different decision strategies.

Citations (542)

View on Semantic Scholar

Summary

The paper demonstrates that using localized, non-spatial features, BagNets achieve up to 87.6% top-5 accuracy on ImageNet.
The study reveals that BagNets offer superior interpretability by making decisions based solely on isolated image patches.
The authors highlight that this simplified decision strategy mirrors trends in conventional CNNs, prompting future research in attribution and causality.

Approximating CNNs with Bag-of-local-Features Models on ImageNet

Summary

The paper presents an in-depth exploration of a novel deep neural network (DNN) architecture, termed BagNet, which is a variant of ResNet-50. BagNets offer an interpretable framework by classifying images based on non-spatially-ordered local image features, reminiscent of traditional Bag-of-Features (BoF) models. The authors demonstrate that BagNet achieves significant performance on ImageNet, with a top-5 accuracy of 87.6% using 33x33 pixel features, and 80.5% with 17x17 pixel features, comparable to AlexNet. This approach provides insights into how specific parts of an image contribute to classification, enabling the analysis of model decisions with greater transparency.

Key Findings

Interpretability and Accuracy: BagNets enable easier interpretation of model decisions due to their reliance on isolated local features without considering spatial configurations. Despite this simplification, they maintain competitive accuracy on ImageNet.
Comparison with Conventional DNNs: The behavior of BagNets mirrors that of high-performance networks such as VGG-16, ResNet-152, and DenseNet-169, suggesting that enhancements in DNNs may result more from fine-tuning and optimization than from fundamentally distinct decision strategies.
Similar Decision-Making: The paper highlights that BagNets, while limited to localized features, share error distribution and interaction characteristics with conventional DNNs, indicating that many state-of-the-art networks rely on local statistical regularities.

Implications and Future Directions

The findings imply that training strategies and architecture designs focused solely on improving recognition performance without considering interpretability might overlook simpler yet effective decision patterns. The development of tasks that require more holistic feature integration could encourage models to learn more global and causal representations.

Interpretability in Applications: BagNets hold potential for sectors where attributing decisions to specific features is crucial, such as in medical imaging diagnostics or visual inspection systems.
Attribution Benchmarking: With a structure that explicates feature contributions directly, BagNets could serve as benchmarks for evaluating and improving attribution methods used in more complex DNNs.
Broader Implications in AI: Recognizing that many neural architectures might not substantially deviate from past paradigms in terms of feature reliance challenges current perceptions of DNN advancements. This could steer future research towards fostering architectures that emulate human-like holistic perception and reasoning.

Conclusion

The paper provides a compelling case that BagNets can bridge the gap between classic BoF models and modern neural architectures, delivering high performance with enhanced transparency. Although the utilization of purely local features substantially mimics current DNN strategies, future work could leverage BagNets to inspire developments in architectures and tasks that prioritize interpretability and causal understanding. This contribution offers both a practical tool for certain applications and a theoretical lens to reassess the evolutionary path of neural networks.

PDF Markdown

Related Papers

Tweets

https://twitter.com/srchvrs/status/1766432283433607372