Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MegDet: A Large Mini-Batch Object Detector (1711.07240v4)

Published 20 Nov 2017 in cs.CV

Abstract: The improvements in recent CNN-based object detection works, from R-CNN [11], Fast/Faster R-CNN [10, 31] to recent Mask R-CNN [14] and RetinaNet [24], mainly come from new network, new framework, or novel loss design. But mini-batch size, a key factor in the training, has not been well studied. In this paper, we propose a Large MiniBatch Object Detector (MegDet) to enable the training with much larger mini-batch size than before (e.g. from 16 to 256), so that we can effectively utilize multiple GPUs (up to 128 in our experiments) to significantly shorten the training time. Technically, we suggest a learning rate policy and Cross-GPU Batch Normalization, which together allow us to successfully train a large mini-batch detector in much less time (e.g., from 33 hours to 4 hours), and achieve even better accuracy. The MegDet is the backbone of our submission (mmAP 52.5%) to COCO 2017 Challenge, where we won the 1st place of Detection task.

Citations (314)

Summary

  • The paper introduces a large mini-batch training strategy, using up to 256 samples on 128 GPUs to reduce training time from 33 to 4 hours while maintaining high accuracy.
  • It incorporates a warmup learning rate policy and cross-GPU batch normalization to stabilize and optimize the training process at scale.
  • The implementation achieved a 52.5% mmAP at the COCO 2017 Challenge, underscoring its effectiveness in advancing state-of-the-art object detection.

Overview of "MegDet: A Large Mini-Batch Object Detector"

The paper "MegDet: A Large Mini-Batch Object Detector" by Chao Peng et al. introduces a significant advance in the field of object detection within the context of deep learning architectures. It specifically investigates the role of mini-batch sizes in the training of object detectors, a topic that has received limited attention in prior research. The authors propose a framework called MegDet, which supports large mini-batch training, achieving substantial reductions in training time without compromising accuracy.

Key Contributions

  1. Large Mini-Batch Training: The authors analyze the impact of mini-batch sizes on object detection, contrasting with the generally small mini-batch sizes used in popular detectors like Faster R-CNN and Mask R-CNN. MegDet allows for a mini-batch size of up to 256, utilizing the computational power of 128 GPUs. This transition drastically shortens training time, demonstrating a reduction from 33 hours to just 4 hours, while maintaining high accuracy levels.
  2. Warmup Learning Rate Policy and Cross-GPU Batch Normalization (CGBN): Two key techniques underpin the success of MegDet in managing large-scale mini-batches:
    • Warmup Learning Rate Policy: This technique involves gradually increasing the learning rate during the initial training phase, ensuring stable convergence.
    • CGBN: The authors introduce Cross-GPU Batch Normalization, which aggregates batch normalization statistics across multiple GPUs. This addresses the challenges posed by imbalanced batch statistics in smaller mini-batches, thereby stabilizing training at scale.
  3. Numerical Results: The implementation of MegDet resulted in an impressive mmAP of 52.5% at the COCO 2017 Detection Challenge, earning first place in the Detection task. The capability to train models significantly faster promotes an accelerated innovation cycle in the development of object detection frameworks.

Implications and Future Directions

The implications of this research are both practical and theoretical. Practically, it remodels the computational footprint and efficiency of training large-scale object detection models, paving the way for more rapid iterations in research and deployment. Theoretically, it challenges existing paradigms concerning mini-batch sizes and learning rates, fostering further exploration into optimization techniques for neural network training.

Future research may explore understanding how large mini-batch training affects the generalization and convergence properties of deep learning models in diverse applications. Moreover, extending these concepts to other domains within machine learning could unearth additional efficiencies and insights.

This paper marks a pivotal point in the discussions around training efficacy and computational resource utilization in deep learning, offering foundational insights that can spur future developments in AI and machine learning technologies.