PCL: Proposal Cluster Learning for Weakly Supervised Object Detection (1807.03342v2)

Published 9 Jul 2018 in cs.CV

Abstract: Weakly Supervised Object Detection (WSOD), using only image-level annotations to train object detectors, is of growing importance in object recognition. In this paper, we propose a novel deep network for WSOD. Unlike previous networks that transfer the object detection problem to an image classification problem using Multiple Instance Learning (MIL), our strategy generates proposal clusters to learn refined instance classifiers by an iterative process. The proposals in the same cluster are spatially adjacent and associated with the same object. This prevents the network from concentrating too much on parts of objects instead of whole objects. We first show that instances can be assigned object or background labels directly based on proposal clusters for instance classifier refinement, and then show that treating each cluster as a small new bag yields fewer ambiguities than the directly assigning label method. The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one. Experiments are conducted on the PASCAL VOC, ImageNet detection, and MS-COCO benchmarks for WSOD. Results show that our method outperforms the previous state of the art significantly.

Citations (349)

View on Semantic Scholar

Summary

The paper presents a novel method that forms proposal clusters to enhance object detection using only image-level annotations.
It employs an iterative instance classifier refinement strategy within an online end-to-end training framework to reduce detection ambiguities.
Empirical results demonstrate a significant improvement, achieving 48.8% mAP on PASCAL VOC 2007 and outperforming previous WSOD methods.

Overview of "PCL: Proposal Cluster Learning for Weakly Supervised Object Detection"

The paper "PCL: Proposal Cluster Learning for Weakly Supervised Object Detection" introduces a novel approach for addressing the Weakly Supervised Object Detection (WSOD) problem, which has grown increasingly important in the field of object recognition. The primary ambition of the approach is to leverage only image-level annotations to train object detectors efficiently, sidestepping the resource-intensive need for detailed bounding-box annotations.

The authors propose a deep network architecture designed specifically for WSOD, which distinguishes itself from preceding methodologies by avoiding the traditional reliance on Multiple Instance Learning (MIL) that recasts object detection as an image classification problem. Instead, they introduce a method for generating "proposal clusters," wherein spatially adjacent proposals that pertain to the same object are grouped. This strategy ensures that the network focuses on whole objects rather than merely object parts, thereby closing the existing gap between image classification and object detection.

Key Technical Contributions

Proposal Cluster Generation: The network evolves to identify proposal clusters, deemed critical since ground-truth bounding boxes aren't available in weak supervisions. By using high classification score proposals as initial cluster centers, the proposed model iteratively refines these clusters to ensure they accurately represent distinct objects.
Iterative Instance Classifier Refinement: The model employs an iterative framework across multiple streams within a convolutional network, where each subsequent stream is refined by leveraging outputs from the prior stream. This method iteratively refines the instance classifiers by treating each proposal cluster as a new "bag," a modified MIL strategy that reduces ambiguities prevalent in traditional methods.
Online End-to-End Training: Rather than relying on alternate strategies that separate different stages of training, the authors propose an online, end-to-end approach within a single network. This ensures that the refined classifiers are learned concurrently with the basic network, reducing computational expense and improving efficacy.

Empirical Results

The implementation of the proposed PCL method provided significant quantitative improvements, as evidenced by experimentation on standard benchmarks such as PASCAL VOC, ImageNet detection, and MS-COCO. Specifically, on the PASCAL VOC 2007 dataset, the method achieved a mean Average Precision (mAP) of 48.8%, markedly surpassing previous state-of-the-art figures.

Implications and Future Work

The Proposal Cluster Learning approach showcases potential beyond WSOD by fashioning a framework wherein object detection without bounding boxes can become more accurate and computationally feasible. Practically, this work paves the way for more scalable object detection systems that can utilize the extensive, albeit weakly labeled, datasets proliferating in the digital era.

Theoretically, the proposition of leveraging adaptive proposal cluster numbers and spatial adjacency shows promise in other contexts of weak supervision, such as semantic segmentation in images, which may benefit from a similar proposal-cluster-influenced learning paradigm.

Future research may refine the PCL framework's robustness by integrating spatial contextual information explicitly or by applying these methods to diverse applications beyond traditional datasets. Furthermore, understanding the interplay between network depth, cluster generation strategies, and classification refinement could unearth deeper insights into weakly supervised learning paradigms.

Overall, the paper presents a comprehensive methodology for weakly supervised object detection, providing a vital step forward in the domain by effectively marrying proposal cluster generation with iterative refinement, all underpinned by online end-to-end training protocols.

PDF Markdown