Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ScratchDet: Training Single-Shot Object Detectors from Scratch (1810.08425v4)

Published 19 Oct 2018 in cs.CV

Abstract: Current state-of-the-art object objectors are fine-tuned from the off-the-shelf networks pretrained on large-scale classification dataset ImageNet, which incurs some additional problems: 1) The classification and detection have different degrees of sensitivity to translation, resulting in the learning objective bias; 2) The architecture is limited by the classification network, leading to the inconvenience of modification. To cope with these problems, training detectors from scratch is a feasible solution. However, the detectors trained from scratch generally perform worse than the pretrained ones, even suffer from the convergence issue in training. In this paper, we explore to train object detectors from scratch robustly. By analysing the previous work on optimization landscape, we find that one of the overlooked points in current trained-from-scratch detector is the BatchNorm. Resorting to the stable and predictable gradient brought by BatchNorm, detectors can be trained from scratch stably while keeping the favourable performance independent to the network architecture. Taking this advantage, we are able to explore various types of networks for object detection, without suffering from the poor convergence. By extensive experiments and analyses on downsampling factor, we propose the Root-ResNet backbone network, which makes full use of the information from original images. Our ScratchDet achieves the state-of-the-art accuracy on PASCAL VOC 2007, 2012 and MS COCO among all the train-from-scratch detectors and even performs better than several one-stage pretrained methods. Codes will be made publicly available at https://github.com/KimSoybean/ScratchDet.

Citations (122)

Summary

  • The paper demonstrates that integrating BatchNorm in both backbone and detection head enables stable training from scratch.
  • It introduces the Root-ResNet architecture to preserve image details, substantially improving detection of small objects.
  • Empirical results show that ScratchDet achieves superior mAP on VOC and COCO benchmarks compared to traditional pretrained approaches.

Insights into ScratchDet: Training Single-Shot Object Detectors from Scratch

The paper "ScratchDet: Training Single-Shot Object Detectors from Scratch" introduces a novel approach to object detection by shifting away from the prevalent reliance on pretrained networks. Instead, it explores training object detectors from scratch, addressing key challenges associated with this process, such as performance degradation and convergence issues. The authors posit that the application of Batch Normalization (BatchNorm) across both backbone and detection head subnetworks is critical in facilitating stable and effective training from scratch.

Objective Analysis

Traditional object detection frameworks often leverage pretrained networks, such as VGGNet and ResNet, trained on large datasets like ImageNet. These frameworks come with inherent limitations. The primary issue is that classification and detection tasks exhibit varying sensitivity to translation invariance, leading to learning bias. Furthermore, architectural rigidity hinders adaptation and necessitates costly re-training for new designs.

In response, this paper argues for a detector trained entirely from scratch, bypassing the constraints of pretrained networks. A significant focus lies on optimizing the training landscape. Prior works typically face convergence hurdles when trained without pretraining, seemingly overlooked until now, where BatchNorm's role is emphasized. By smoothing the optimization process, BatchNorm allows for stable training using higher learning rates, thus exploring larger solution spaces and achieving superior outcomes.

Core Contributions

  1. Integration of BatchNorm: The incorporation of BatchNorm into both backbone and head enhances training stability, independent of the network architecture. This contribution serves as a pivotal step towards fostering robust training dynamics in single-shot detectors.
  2. Root-ResNet Architecture: The strategic redesign of backbones utilizing the Root-ResNet architecture, which modifies the initial downsampling layers, capitalizes on more information from input images. The new design is particularly beneficial for detecting smaller objects, which is often a challenging task for object detectors.
  3. Empirical Results: Extensive experiments demonstrate ScratchDet's efficacy, establishing new benchmarks on PASCAL VOC 2007, 2012, and MS COCO datasets. Notably, ScratchDet achieves higher mean Average Precision (mAP) compared to other traditionally trained-from-scratch detectors and several pretrained methods, improving state-of-the-art mAP on COCO by 2.7%.

Speculative Future Directions

This research paves the way for rethinking detector architecture design, offering increased flexibility devoid of pretrained model constraints. Future work in this domain might explore the integration of more sophisticated normalization techniques, potentially enhancing convergence rates and overall detection performance further. Additionally, the approach could be extrapolated to other tasks like semantic segmentation, where pretrained networks also dominate.

Theoretical and Practical Implications

Theoretically, this work challenges the entrenched need for pretrained networks in object detection, proposing that with suitable architectural adjustments and training strategies, comparable or even superior performance is achievable. Practically, the paper provides valuable insights into detector design and training best practices, yielding potentially reduced reliance on large-scale datasets for pretraining, thus mitigating associated computational burdens.

In conclusion, "ScratchDet" is a noteworthy advancement in the object detection domain, offering a framework that emphasizes architectural and methodological innovations over dependency on existing pretrained models. The integration of BatchNorm as a primary mechanism for improved training dynamics underscores the importance of reevaluating optimization strategies within deep learning, paving the path for further exploration and innovation.