Adversarial Complementary Learning for Weakly Supervised Object Localization (1804.06962v1)

Published 19 Apr 2018 in cs.CV

Abstract: In this work, we propose Adversarial Complementary Learning (ACoL) to automatically localize integral objects of semantic interest with weak supervision. We first mathematically prove that class localization maps can be obtained by directly selecting the class-specific feature maps of the last convolutional layer, which paves a simple way to identify object regions. We then present a simple network architecture including two parallel-classifiers for object localization. Specifically, we leverage one classification branch to dynamically localize some discriminative object regions during the forward pass. Although it is usually responsive to sparse parts of the target objects, this classifier can drive the counterpart classifier to discover new and complementary object regions by erasing its discovered regions from the feature maps. With such an adversarial learning, the two parallel-classifiers are forced to leverage complementary object regions for classification and can finally generate integral object localization together. The merits of ACoL are mainly two-fold: 1) it can be trained in an end-to-end manner; 2) dynamically erasing enables the counterpart classifier to discover complementary object regions more effectively. We demonstrate the superiority of our ACoL approach in a variety of experiments. In particular, the Top-1 localization error rate on the ILSVRC dataset is 45.14%, which is the new state-of-the-art.

Citations (547)

View on Semantic Scholar

Summary

The paper introduces the ACoL framework, a dual-branch approach that uses adversarial erasing to reveal complete object regions from image-level labels.
The framework derives localization maps directly from final convolutional layer feature maps, eliminating the need for post-training steps.
Experiments on ILSVRC and CUB-200-2011 demonstrate state-of-the-art performance with a Top-1 localization error rate of 45.14%.

Adversarial Complementary Learning for Weakly Supervised Object Localization

The paper introduces the Adversarial Complementary Learning (ACoL) framework, a novel method for weakly supervised object localization (WSOL), aiming to efficiently identify entire object regions using only image-level labels. This approach addresses common challenges in WSOL, specifically the reliance on limited, discriminative object parts for recognition, as seen in preceding methods such as Class Activation Mapping (CAM).

Methodological Insights

The ACoL framework is designed with simplicity and efficiency in mind. A key theoretical contribution is the proof that object localization maps can be derived directly from class-specific feature maps of the last convolutional layer, rather than requiring a separate step post-training. This insight is leveraged to facilitate the generation of localization maps within the forward pass of a model, thereby streamlining the process and enhancing integration with network architectures.

ACoL incorporates a dual-branch network architecture composed of two parallel classifiers tasked with complementary objectives. Classifier A targets discriminative object parts, which are then partially erased from the feature maps provided to Classifier B. This erasure forces Classifier B to explore complementary regions of the object, culminating in a more holistic object localization when the outputs of both classifiers are fused.

Empirical Results

In extensive experiments on the ILSVRC and CUB-200-2011 datasets, ACoL demonstrates state-of-the-art performance. Notably, it achieves a Top-1 localization error rate of 45.14% on the ILSVRC dataset, setting a new benchmark for weakly supervised methods. These results underscore the efficacy of the adversarial erasing mechanism in uncovering comprehensive object regions that might not be captured by traditional single-branch networks.

Theoretical and Practical Implications

The theoretical foundation laid by the paper simplifies the integration of object localization capabilities into existing deep learning architectures without incurring significant additional computational cost. Practically, this allows for the deployment of more efficient WSOL systems, reducing the dependency on labor-intensive bounding box annotations. This step towards more autonomous and adaptable object localization models aligns with trends seeking to minimize manual intervention in model training.

Future Directions

The research opens up avenues for exploring how adversarial networks can be employed to further improve weakly supervised tasks beyond object localization, such as semantic segmentation and action recognition. Investigating the integration of this approach with other advancements in network architectures, like those involving attention mechanisms or transformers, could yield further gains in model interpretability and accuracy.

Overall, the ACoL framework represents a significant advancement in weakly supervised learning, providing a robust, theoretically sound, and highly effective solution for object localization tasks.

PDF Markdown