A Realistic Protocol for Evaluation of Weakly Supervised Object Localization (2404.10034v2)
Abstract: Weakly Supervised Object Localization (WSOL) allows training deep learning models for classification and localization (LOC) using only global class-level labels. The absence of bounding box (bbox) supervision during training raises challenges in the literature for hyper-parameter tuning, model selection, and evaluation. WSOL methods rely on a validation set with bbox annotations for model selection, and a test set with bbox annotations for threshold estimation for producing bboxes from localization maps. This approach, however, is not aligned with the WSOL setting as these annotations are typically unavailable in real-world scenarios. Our initial empirical analysis shows a significant decline in LOC performance when model selection and threshold estimation rely solely on class labels and the image itself, respectively, compared to using manual bbox annotations. This highlights the importance of incorporating bbox labels for optimal model performance. In this paper, a new WSOL evaluation protocol is proposed that provides LOC information without the need for manual bbox annotations. In particular, we generated noisy pseudo-boxes from a pretrained off-the-shelf region proposal method such as Selective Search, CLIP, and RPN for model selection. These bboxes are also employed to estimate the threshold from LOC maps, circumventing the need for test-set bbox annotations. Our experiments with several WSOL methods on ILSVRC and CUB datasets show that using the proposed pseudo-bboxes for validation facilitates the model selection and threshold estimation, with LOC performance comparable to those selected using GT bboxes on the validation set and threshold estimation on the test set. It also outperforms models selected using class-level labels, and then dynamically thresholded based solely on LOC maps.
- Weakly supervised object localization via transformer with implicit spatial calibration. ECCV.
- Channel-wise early stopping without a validation set via NNK polytope interpolation. In APSIPA.
- Borji, A. (2019). Pros and cons of GAN evaluation measures. Computer Vision and Image Understanding, 179:41–65.
- Emerging properties in self-supervised vision transformers. In ICCV, pages 9650–9660.
- Subjectivity in unsupervised machine learning model selection. CoRR, abs/2309.00201.
- Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Medical image analysis, 54:280–296.
- Evaluating weakly supervised object localization methods right. In CVPR.
- Attention-based dropout layer for weakly supervised object localization. In CVPR.
- Imagenet: A large-scale hierarchical image database. In CVPR.
- Addressing parameter choice issues in unsupervised domain adaptation by aggregation. In ICLR.
- Early stopping as nonparametric variational inference. In Artificial intelligence and statistics, pages 1070–1077.
- Better practices for domain adaptation. In AutoML.
- Source-free unsupervised domain adaptation: A survey. CoRR, abs/2301.00265.
- Improving model selection by nonconvergent methods. Neural Networks, 6(6):771–783.
- Ts-cam: Token semantic coupled attention map for weakly supervised object localization. In ICCV.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS.
- How good are detection proposals, really? In BMVC.
- Leave zero out: Towards a no-cross-validation approach for model selection. CoRR, abs/2012.13309.
- CLIP is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In CVPR.
- Evaluation of robustness and performance of early stopping rules with multi layer perceptrons. In IJCNN.
- Early stopping without a validation set. CoRR, abs/1703.09580.
- Semi-weakly supervised object detection by sampling pseudo ground-truth boxes. In IJCNN.
- Generalization and parameter estimation in feedforward nets: Some experiments. NeurIPS, 2.
- Realistic evaluation of deep semi-supervised learning algorithms. In NeurIPS.
- Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1):62–66.
- Do imagenet classifiers generalize to imagenet? In ICML.
- Faster r-cnn: Towards real-time object detection with region proposal networks. NeurIPS.
- Deep weakly-supervised learning methods for classification and localization in histology images: A survey. Machine Learning for Biomedical Imaging, 2:96–150.
- Tune it the right way: Unsupervised validation of domain adaptation via soft neighborhood density. In ICCV.
- Improved techniques for training gans. In NeurIPS.
- A reproducible and realistic evaluation of partial domain adaptation methods. In NeurIPS Workshop on Distribution Shifts: Connecting Methods and Applications.
- Deepnnk: Explaining deep models and their generalization using polytope interpolation. CoRR, abs/2007.10505.
- Information-theoretical learning of discriminative clusters for unsupervised domain adaptation. In ICLM.
- Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In ICCV.
- A realistic evaluation of semi-supervised learning for fine-grained classification. In CVPR.
- Selective search for object recognition. IJCV, 104:154–171.
- Generalized model selection for unsupervised learning in high dimensions. In NeurIPS.
- The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology.
- Spatial-aware token for weakly supervised object localization. In ICCV.
- Can we evaluate domain adaptation models without target-domain labels? CoRR, abs/2305.18712.
- Combinational class activation maps for weakly supervised object localization. In WACV.
- Towards accurate model selection in deep unsupervised domain adaptation. In Chaudhuri, K. and Salakhutdinov, R., editors, ICML.
- Early stopping against label noise without validation data. In ICLR.
- Top-down neural attention by excitation backprop. IJCV, 126(10):1084–1102.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR.
- Adversarial complementary learning for weakly supervised object localization. In CVPR.
- Learning deep features for discriminative localization. In CVPR.
- Zhou, Z.-H. (2018). A brief introduction to weakly supervised learning. National science review, 5(1):44–53.