NAS-FCOS: Fast Neural Architecture Search for Object Detection
The development of object detection has been central to advancements in computer vision, largely driven by convolutional neural networks like Faster R-CNN and RetinaNet. However, the process of designing these networks is complex due to the necessity for simultaneous object localization and classification. Neural Architecture Search (NAS) approaches promise to automate this design process, reducing the manual labor involved. Despite their potential, NAS methods typically demand extensive computational resources, rendering them impractical for wide application, especially in object detection. This paper introduces NAS-FCOS, which efficiently searches for the decoder architectures of object detectors, specifically targeting Feature Pyramid Networks (FPN) and prediction heads of the FCOS network, utilizing tailored reinforcement learning techniques.
Key Contributions
Several unique contributions distinguish NAS-FCOS:
- Efficient Search Strategy: The NAS-FCOS method integrates a progressive search strategy, avoiding full network training by focusing exclusively on the decoder structures. This approach achieves a balance between the performance of proxy tasks and the main tasks, minimizing computation through efficient feature extraction and minimal tuning of prediction heads.
- Improved Architecture: By utilizing deformable and separable convolutions, NAS-FCOS uncovers an architecture that delivers significant improvements in average precision (AP) on the COCO dataset, outperforming strong baselines like Faster R-CNN and RetinaNet, while maintaining comparable resource demands.
- Exploration of Decoder Spaces: The search investigates both FPN and prediction head modifications, leveraging shared head structures to enhance detection performance across varied scales.
Technical Approach
The NAS-FCOS architecture is formulated through:
- Search Space Definition: It defines distinct spaces for FPNs and prediction heads, leveraging basic operation primitives like convolution variants and skip connections in the FPN and head search.
- Reinforcement Learning-Based Search: Employing a reinforcement learning framework, NAS-FCOS uses a proxy task for efficient evaluation, circumventing the high cost of conventional full network training and enhancing through techniques such as shared weight parameters and cached feature representations.
Experimental Insights
- Comparison Metrics: NAS-FCOS achieves consistent improvements over traditional architectures in terms of AP scores with varied backbones like MobileNetV2, ResNet, and ResNeXt, highlighting the efficacy of the proposed architecture search strategy.
- Efficiency and Resource Savings: The architecture discovery is completed in four days using eight V100 GPUs—a substantial reduction relative to previous NAS efforts, underscoring the method's efficiency.
Future Directions
The NAS-FCOS model provides a modular approach, suggesting potential for future explorations in optimizing other components of detection architectures or adapting the approach to emergent backbones. The efficient search strategy may extend to other vision tasks, offering further avenues for reducing computational constraints in NAS.
In conclusion, NAS-FCOS demonstrates a potent blend of efficiency and effectiveness in neural architecture search methodologies, heralding a promising step forward for object detection and beyond by simultaneously minimizing computational demands and advancing detection capabilities.