Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets (2007.04178v2)

Published 8 Jul 2020 in cs.CV

Abstract: Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train localization models with only image-level labels. Since the seminal WSOL work of class activation mapping (CAM), the field has focused on how to expand the attention regions to cover objects more broadly and localize them better. However, these strategies rely on full localization supervision for validating hyperparameters and model selection, which is in principle prohibited under the WSOL setup. In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set. We observe that, under our protocol, the five most recent WSOL methods have not made a major improvement over the CAM baseline. Moreover, we report that existing WSOL methods have not reached the few-shot learning baseline, where the full-supervision at validation time is used for model training instead. Based on our findings, we discuss some future directions for WSOL.

Citations (22)

View on Semantic Scholar

Summary

The paper introduces an enhanced evaluation protocol for WSOL that uses limited full supervision for effective hyperparameter tuning.
It proposes new performance metrics like MaxBoxAccV2 and mPxAP to isolate localization accuracy from classification results.
It standardizes dataset splits across ImageNet, CUB, and OpenImages, enabling consistent comparisons and highlighting WSOL limitations.

Overview of "Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets"

The paper "Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets" critically examines and proposes enhancements to the approach of evaluating Weakly-Supervised Object Localization (WSOL) techniques. The paper addresses key challenges in WSOL and suggests a refined evaluation protocol, metrics, and benchmark datasets.

Core Contributions

Evaluation Protocol: The authors identify WSOL as a fundamentally ill-posed problem when leveraging only image-level labels for object localization. They advocate for an enhanced evaluation protocol that involves full supervision on a small held-out set. This methodology assists in hyperparameter tuning and model selection without influencing test results.
Performance Metrics: Traditional evaluation metrics often mesh classification and localization performance, leading to ambiguous interpretations. This paper introduces metrics like MaxBoxAccV2, which focuses solely on localization accuracy by considering IoU (Intersection over Union) thresholds across score map normalizations. Mean pixel average precision (mPxAP) is also proposed for when pixel-wise mask annotations are available.
Comprehensive Dataset Splits: The paper proposes standardized datasets, including ImageNet, CUB, and OpenImages, with specifically designed splits into train-weaksup, train-fullsup, and test sets to unify evaluations across various WSOL methodologies.

Analytical Findings

Method Comparisons: The empirical evaluations compare six WSOL methods (e.g., CAM, HaS, ACoL) across three widely used architectures (VGG, Inception, ResNet). Results under the proposed protocol reveal that subsequent WSOL methods have not substantially surpassed CAM, challenging previous findings that claimed significant improvements.
Saliency Methods as WSOL Baselines: By evaluating visual interpretability methods like Guided Backprop and Integrated Gradients within the context of WSOL, the paper assesses their efficacy. It finds these methods typically underperform compared to CAM.
Few-shot Learning (FSL) Baselines: In scenarios where limited full supervision is available, FSL methods tend to outperform WSOL methods, even using simple saliency network architectures. This result emphasizes the potential utility of direct localization training when some fully labeled data are accessible.

Implications and Future Directions

The authors highlight the importance of separating classification accuracy from localization capability within the WSOL task to more accurately assess method effectiveness. Furthermore, the findings suggest that integrating a modest amount of fully supervised data can be beneficial, a notion that could inspire a paradigmatic shift towards semi-weakly-supervised approaches.

For future directions, the authors recommend exploring learning paradigms that leverage both weak and full supervision and rethinking training setups to resolve the intrinsic ill-posedness of WSOL. Additionally, the inclusion of diverse background-class images could aid in mitigating some of the existing challenges in distinguishing foreground objects.

The paper provides a comprehensive framework for benchmarking WSOL, aligning WSOL more closely with the challenges of real-world applications and fostering a deeper understanding of model limitations and capabilities.

In conclusion, this paper serves as a critical resource for WSOL research, emphasizing methodological clarity and suggesting practical pathways for both current and future exploration in the area of weakly supervised learning.

PDF Markdown

Related Papers

GitHub

GitHub - clovaai/wsolevaluation: Evaluating Weakly Supervised Object Localization Methods Right (CVPR 2020) (335 stars)

YouTube

Show All Videos