Scale-Aware Trident Networks for Object Detection (1901.01892v2)

Published 7 Jan 2019 in cs.CV

Abstract: Scale variation is one of the key challenges in object detection. In this work, we first present a controlled experiment to investigate the effect of receptive fields for scale variation in object detection. Based on the findings from the exploration experiments, we propose a novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power. We construct a parallel multi-branch architecture in which each branch shares the same transformation parameters but with different receptive fields. Then, we adopt a scale-aware training scheme to specialize each branch by sampling object instances of proper scales for training. As a bonus, a fast approximation version of TridentNet could achieve significant improvements without any additional parameters and computational cost compared with the vanilla detector. On the COCO dataset, our TridentNet with ResNet-101 backbone achieves state-of-the-art single-model results of 48.4 mAP. Codes are available at https://git.io/fj5vR.

Authors (4)

Yanghao Li (43 papers)
Yuntao Chen (37 papers)
Naiyan Wang (65 papers)
Zhaoxiang Zhang (162 papers)

Citations (825)

View on Semantic Scholar

Summary

Scale-Aware Trident Networks for Object Detection

The paper "Scale-Aware Trident Networks for Object Detection" presents an advanced methodology to address the persistent challenge of scale variation in object detection. The proposed method introduces TridentNet, an innovative architecture that generates scale-specific feature maps efficiently through a multi-branch network structure with shared transformation parameters, leveraging different receptive fields across branches.

Core Contributions

Controlled Receptive Field Experiments: The research begins with a controlled experiment to examine how varying receptive fields influence object detection performance across different object scales. The findings reveal that an optimal receptive field size is highly scale-dependent, motivating the design of a scale-aware network structure.
Trident Network Architecture: The TridentNet comprises parallel branches in which each branch is designed to handle a specific range of object scales through tailored receptive fields. This is achieved by employing dilated convolutions within each branch, allowing them to share identical weights but vary in receptive field size.
Scale-Aware Training Scheme: A novel training scheme is proposed to enhance the specialization of each branch. During training, each branch only samples objects within a specified scale range appropriate for its receptive field, thereby avoiding the inefficiencies of training with mismatched object scales.
Fast Inference Approximation: For practical deployment, the paper introduces TridentNet Fast, which approximates the full performance of TridentNet using a single major branch during inference. This approach incurs no additional computational cost compared to a standard detector while maintaining significant performance gains.

Key Results

The TridentNet with a ResNet-101 backbone achieved a state-of-the-art single-model result of 48.4 mAP on the COCO dataset.
Utilizing the default three-branch Trident architecture (with dilation rates set to 1, 2, and 3), the network demonstrates a consistent improvement over single-scale baselines and compares favorably against other advanced methods like FPN and ASPP.
The weight-sharing strategy, coupled with a scale-aware training approach, achieves a substantial performance increase, especially for small and large objects, as indicated by notable improvements in AP metrics across different object scales.

Implications and Future Developments

The proposed TridentNet framework addresses the scale variation challenge by effectively balancing the representational power of the network for objects of varying sizes. This architecture can potentially alleviate overfitting and improve generalization across object scales due to the uniform transformation parameters shared among branches.

Practically, TridentNet's design and its fast approximation version offer significant benefits for real-world applications where inference speed and model efficiency are critical. The ability to achieve enhanced performance without additional computational overhead makes it suitable for deployment in environments with limited resources.

Theoretically, the concept of weight-sharing among branches with variable receptive fields may be extended to other computer vision tasks that struggle with multi-scale representation. Future research could explore integrating advanced techniques such as attention mechanisms or exploring different network backbones to further leverage the Trident architecture's capabilities.

Conclusion

The work done in "Scale-Aware Trident Networks for Object Detection" provides a compelling solution to the issue of scale variation in object detection. By creating a network structure that generates scale-specific feature maps with shared parameters, the paper offers an efficient and robust framework for improving object detection accuracy across various scales. This research not only contributes to the incremental advancements in detection technology but also sets a precedent for future explorations in adaptive receptive field networks.

PDF Markdown

Related Papers

Find Related Papers