Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 62 tok/s

Gemini 2.5 Pro 45 tok/s Pro

GPT-5 Medium 24 tok/s Pro

GPT-5 High 26 tok/s Pro

GPT-4o 105 tok/s Pro

Kimi K2 206 tok/s Pro

GPT OSS 120B 440 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

One-Shot Object Detection with Co-Attention and Co-Excitation (1911.12529v1)

Published 28 Nov 2019 in cs.CV, cs.LG, and eess.IV

Abstract: This paper aims to tackle the challenging problem of one-shot object detection. Given a query image patch whose class label is not included in the training data, the goal of the task is to detect all instances of the same class in a target image. To this end, we develop a novel {\em co-attention and co-excitation} (CoAE) framework that makes contributions in three key technical aspects. First, we propose to use the non-local operation to explore the co-attention embodied in each query-target pair and yield region proposals accounting for the one-shot situation. Second, we formulate a squeeze-and-co-excitation scheme that can adaptively emphasize correlated feature channels to help uncover relevant proposals and eventually the target objects. Third, we design a margin-based ranking loss for implicitly learning a metric to predict the similarity of a region proposal to the underlying query, no matter its class label is seen or unseen in training. The resulting model is therefore a two-stage detector that yields a strong baseline on both VOC and MS-COCO under one-shot setting of detecting objects from both seen and never-seen classes. Codes are available at https://github.com/timy90022/One-Shot-Object-Detection.

Citations (160)

View on Semantic Scholar

Summary

The paper introduces CoAE, a framework that employs co-attention and co-excitation to effectively detect unseen object classes.
It integrates non-local object proposals, a squeeze-and-co-excitation module, and a margin-based ranking loss, achieving AP improvements of 63.8% on VOC and 22.0% on COCO.
The paper highlights the promise of combining attention strategies to advance class-agnostic one-shot detection and inspire further research in few-shot learning.

One-Shot Object Detection with Co-Attention and Co-Excitation

The paper presents a novel framework for one-shot object detection tasks, leveraging co-attention and co-excitation, denoted as CoAE. The primary challenge addressed is the detection of objects from unseen classes with the guidance of a query image patch, without the query class label being included in the training data. This task is an essential aspect of advancing toward more flexible and human-like visual recognition systems.

Key Contributions

The CoAE approach is designed around three pivotal innovations:

Non-Local Object Proposals: The framework introduces a co-attention mechanism that leverages non-local operations on convolutional feature maps. This aims to enhance region proposal networks (RPN) by generating proposals that are more relevant to the one-shot query, accommodating unseen object classes during inference.
Squeeze-and-Co-Excitation (SCE): This component is developed to emphasize correlated feature channels across both query and target images. By aligning the feature maps of query and target images through a squeeze operation followed by a co-excitation step, the method adaptively highlights the channels crucial for identifying the target objects.
Margin-Based Ranking Loss: The paper further proposes a margin-based ranking loss to facilitate the metric learning process. This loss function is crafted to implicitly prioritize the most similar region proposals relative to the query, thereby enhancing the model's one-shot detection capabilities.

Implementation and Results

The authors adopt a two-stage detection pipeline, built on the Faster R-CNN architecture, and evaluate their approach on well-established benchmark datasets, including PASCAL VOC and MS-COCO. Significant performance improvements under the one-shot detection setting are observed as compared to previous methodologies like Siamese Networks-based approaches and metric-based classifiers.

The CoAE framework demonstrates robust performance gains in average precision (AP) on unseen classes: 63.8% on VOC and 22.0% on COCO. These improvements are achieved without over-relying on pre-trained features for unseen classes, as evidenced by experiments using a reduced ImageNet dataset for pre-training.

Implications and Future Directions

The proposed framework not only introduces a potent baseline for one-shot object detection but also highlights the effectiveness of integrating non-local blocks and excitation mechanisms in enhancing detection performance. The co-attention strategy appears promising for further explorations in few-shot learning tasks, potentially extending applications to areas such as visual tracking and video analysis under similar constraints.

Theoretical implications pivot around the generalization of object detection models to class-agnostic settings. The marriage of co-attention and co-excitation mechanisms could inspire further research into cross-modal retrieval tasks or real-time adaptive object recognition systems without extensive labeled data.

One prospective evolution could involve evaluating the CoAE model's scalability and adaptability when a large number of novel object classes are introduced in complex scenarios, potentially coupling with unsupervised or semi-supervised learning paradigms to refine this capability further.

Overall, this work offers substantial directions for both practical implementations in object detection systems that require adaptability and theoretical advancements in one-shot learning methodologies.