NAS-FCOS: Fast Neural Architecture Search for Object Detection (1906.04423v4)

Published 11 Jun 2019 in cs.CV

Abstract: The success of deep neural networks relies on significant architecture engineering. Recently neural architecture search (NAS) has emerged as a promise to greatly reduce manual effort in network design by automatically searching for optimal architectures, although typically such algorithms need an excessive amount of computational resources, e.g., a few thousand GPU-days. To date, on challenging vision tasks such as object detection, NAS, especially fast versions of NAS, is less studied. Here we propose to search for the decoder structure of object detectors with search efficiency being taken into consideration. To be more specific, we aim to efficiently search for the feature pyramid network (FPN) as well as the prediction head of a simple anchor-free object detector, namely FCOS, using a tailored reinforcement learning paradigm. With carefully designed search space, search algorithms and strategies for evaluating network quality, we are able to efficiently search a top-performing detection architecture within 4 days using 8 V100 GPUs. The discovered architecture surpasses state-of-the-art object detection models (such as Faster R-CNN, RetinaNet and FCOS) by 1.5 to 3.5 points in AP on the COCO dataset, with comparable computation complexity and memory footprint, demonstrating the efficacy of the proposed NAS for object detection.

PDF Abstract

NAS-FCOS: Fast Neural Architecture Search for Object Detection

The development of object detection has been central to advancements in computer vision, largely driven by convolutional neural networks like Faster R-CNN and RetinaNet. However, the process of designing these networks is complex due to the necessity for simultaneous object localization and classification. Neural Architecture Search (NAS) approaches promise to automate this design process, reducing the manual labor involved. Despite their potential, NAS methods typically demand extensive computational resources, rendering them impractical for wide application, especially in object detection. This paper introduces NAS-FCOS, which efficiently searches for the decoder architectures of object detectors, specifically targeting Feature Pyramid Networks (FPN) and prediction heads of the FCOS network, utilizing tailored reinforcement learning techniques.

Key Contributions

Several unique contributions distinguish NAS-FCOS:

Efficient Search Strategy: The NAS-FCOS method integrates a progressive search strategy, avoiding full network training by focusing exclusively on the decoder structures. This approach achieves a balance between the performance of proxy tasks and the main tasks, minimizing computation through efficient feature extraction and minimal tuning of prediction heads.
Improved Architecture: By utilizing deformable and separable convolutions, NAS-FCOS uncovers an architecture that delivers significant improvements in average precision (AP) on the COCO dataset, outperforming strong baselines like Faster R-CNN and RetinaNet, while maintaining comparable resource demands.
Exploration of Decoder Spaces: The search investigates both FPN and prediction head modifications, leveraging shared head structures to enhance detection performance across varied scales.

Technical Approach

The NAS-FCOS architecture is formulated through:

Search Space Definition: It defines distinct spaces for FPNs and prediction heads, leveraging basic operation primitives like convolution variants and skip connections in the FPN and head search.
Reinforcement Learning-Based Search: Employing a reinforcement learning framework, NAS-FCOS uses a proxy task for efficient evaluation, circumventing the high cost of conventional full network training and enhancing through techniques such as shared weight parameters and cached feature representations.

Experimental Insights

Comparison Metrics: NAS-FCOS achieves consistent improvements over traditional architectures in terms of AP scores with varied backbones like MobileNetV2, ResNet, and ResNeXt, highlighting the efficacy of the proposed architecture search strategy.
Efficiency and Resource Savings: The architecture discovery is completed in four days using eight V100 GPUs—a substantial reduction relative to previous NAS efforts, underscoring the method's efficiency.

Future Directions

The NAS-FCOS model provides a modular approach, suggesting potential for future explorations in optimizing other components of detection architectures or adapting the approach to emergent backbones. The efficient search strategy may extend to other vision tasks, offering further avenues for reducing computational constraints in NAS.

In conclusion, NAS-FCOS demonstrates a potent blend of efficiency and effectiveness in neural architecture search methodologies, heralding a promising step forward for object detection and beyond by simultaneously minimizing computational demands and advancing detection capabilities.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Ning Wang (300 papers)
Yang Gao (761 papers)
Hao Chen (1005 papers)
Peng Wang (831 papers)
Zhi Tian (68 papers)
Chunhua Shen (404 papers)
Yanning Zhang (170 papers)

Citations (178)

View on Semantic Scholar

NAS-FCOS: Fast Neural Architecture Search for Object Detection (1906.04423v4)