StarNet: towards Weakly Supervised Few-Shot Object Detection (2003.06798v3)

Published 15 Mar 2020 in cs.CV

Abstract: Few-shot detection and classification have advanced significantly in recent years. Yet, detection approaches require strong annotation (bounding boxes) both for pre-training and for adaptation to novel classes, and classification approaches rarely provide localization of objects in the scene. In this paper, we introduce StarNet - a few-shot model featuring an end-to-end differentiable non-parametric star-model detection and classification head. Through this head, the backbone is meta-trained using only image-level labels to produce good features for jointly localizing and classifying previously unseen categories of few-shot test tasks using a star-model that geometrically matches between the query and support images (to find corresponding object instances). Being a few-shot detector, StarNet does not require any bounding box annotations, neither during pre-training nor for novel classes adaptation. It can thus be applied to the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), where it attains significant improvements over the baselines. In addition, StarNet shows significant gains on few-shot classification benchmarks that are less cropped around the objects (where object localization is key).

Authors (11)

Leonid Karlinsky (79 papers)
Joseph Shtok (12 papers)
Amit Alfassy (9 papers)
Moshe Lichtenstein (3 papers)
Sivan Harary (11 papers)
Eli Schwartz (24 papers)
Sivan Doveh (20 papers)
Prasanna Sattigeri (70 papers)
Rogerio Feris (105 papers)
Alexander Bronstein (2 papers)
Raja Giryes (156 papers)

Citations (6)

View on Semantic Scholar

Summary

Overview of StarNet: Towards Weakly Supervised Few-Shot Object Detection

In the domain of computer vision, the paper "StarNet: Towards Weakly Supervised Few-Shot Object Detection" addresses the pressing challenge of object detection and classification with minimal supervision. The authors introduce an innovative approach by developing StarNet, a model specifically designed to facilitate weakly supervised few-shot object detection (WS-FSOD). This paper articulates the importance of StarNet in mitigating the extensive annotation efforts commonly required in few-shot detection tasks, while simultaneously enhancing the capability to detect and classify novel object categories with limited samples.

Core Methodology and Contributions

StarNet introduces a sophisticated methodology by employing an end-to-end differentiable non-parametric star-model detection and classification head. A primary capability of this approach is its reliance solely on image-level labels, eliminating the necessity for bounding box annotations during both pre-training and adaption to novel classes. This marks a significant deviation from traditional techniques that often demand exhaustive annotations for effective performance. The model's backbone is meta-trained to produce robust features, enabling the simultaneous localization and classification of novel categories by utilizing a geometric star-model matching strategy.

The paper delineates several key contributions:

Novel Task Introduction: The authors propose the task of WS-FSOD, which represents a nuanced approach to object detection that reduces dependency on detailed annotations, thereby expanding the applicability of few-shot detection to new visual domains with minimal annotation investment.
StarNet Model: StarNet is a pioneering model designed for WS-FSOD, leveraging a differentiable geometric matching process to achieve impressive detection results by cross-referencing query and support images.
Performance on Classification Benchmarks: Beyond WS-FSOD, StarNet demonstrates significant gains in few-shot classification tasks on benchmarks featuring less-cropped images, where object localization remains critical.

Experimental Results and Analysis

The empirical evaluation delineated in the paper highlights the superiority of StarNet over various baselines across multiple datasets, including ImageNetLOC-FS, CUB, and PASCAL VOC. The results revealed StarNet’s ability to achieve substantial improvements in weakly supervised settings, particularly in scenarios where traditional models falter due to the absence of bounding box annotations. Notably:

In WS-FSOD, StarNet consistently outperformed other methods, demonstrating a considerable performance advantage in settings with both $1$-shot and $5$-shot configurations.
For few-shot classification, StarNet achieved higher accuracy on datasets with objects in cluttered scenes, revealing its robustness in real-world applications where objects are not always centrally located or isolated within images.

Implications and Future Directions

The implications of this research are multifaceted, affecting both theoretical and practical aspects of computer vision. Theoretically, StarNet challenges the traditional paradigms of object detection and classification by effectively integrating geometric matching in a weakly supervised framework. Practically, it paves the way for developing efficient few-shot models in domains where data annotation is laborious or resources are constrained, offering avenues for expanded application in fields such as autonomous navigation and surveillance.

Future research directions may veer towards refining the accuracy of partial detections and exploring the model’s adaptability to even more complex visual scenes. The integration of advanced matching algorithms and leveraging larger datasets could further augment its robustness and precision. An exciting avenue is the exploration of StarNet’s adaptability in other domains, including 3D object detection, where its geometric matching prowess could be substantially beneficial.

In conclusion, StarNet represents a significant advancement in the journey towards efficient and practical few-shot object detection. The thorough presentation by the authors makes a compelling case for shifting additional focus to weakly supervised methodologies as a means to democratize advanced computer vision technologies.

PDF Markdown

Related Papers

YouTube

Show All Videos