Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MMDetection: Open MMLab Detection Toolbox and Benchmark (1906.07155v1)

Published 17 Jun 2019 in cs.CV, cs.LG, and eess.IV

Abstract: We present MMDetection, an object detection toolbox that contains a rich set of object detection and instance segmentation methods as well as related components and modules. The toolbox started from a codebase of MMDet team who won the detection track of COCO Challenge 2018. It gradually evolves into a unified platform that covers many popular detection methods and contemporary modules. It not only includes training and inference codes, but also provides weights for more than 200 network models. We believe this toolbox is by far the most complete detection toolbox. In this paper, we introduce the various features of this toolbox. In addition, we also conduct a benchmarking study on different methods, components, and their hyper-parameters. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new detectors. Code and models are available at https://github.com/open-mmlab/mmdetection. The project is under active development and we will keep this document updated.

Citations (2,632)

Summary

  • The paper introduces MMDetection, a modular toolbox that unifies diverse object detection and instance segmentation frameworks with state-of-the-art performance.
  • The authors detail a flexible, PyTorch-based architecture that separates components like Backbone, Neck, and RoIHead for easy customization.
  • Benchmarking on COCO 2017 shows competitive training speeds, efficient memory usage, and high accuracy across multiple detection models.

MMDetection: Open MMLab Detection Toolbox and Benchmark

The paper "MMDetection: Open MMLab Detection Toolbox and Benchmark" presents an object detection and instance segmentation toolkit, which provides a comprehensive platform for researchers to develop and benchmark various object detection models. This technical implementation aims to standardize object detection research by providing a modular, flexible, and efficient codebase built on PyTorch. Below, we delve into its features, supported methods, architecture, benchmarking results, and extensive studies on hyper-parameters and other elements.

Features of MMDetection

MMDetection stands out due to several key features:

  1. Modular Design: This decomposition of the detection framework into multiple interchangeable components allows for custom object detection frameworks by combining different modules.
  2. Support for Multiple Frameworks: MMDetection supports a wide array of popular detection frameworks, ensuring it remains contemporaneous with advancements in the field.
  3. High Efficiency: It delivers competitive training speeds and can leverage GPU operations for bounding box (bbox) and mask operations.
  4. State of the Art: Originating from the winning codebase of the 2018 COCO challenge, it continues to evolve and integrate leading methodologies.

Supported Frameworks

MMDetection includes a diverse array of object detection and instance segmentation architectures:

  • Single-stage Methods: These include classic and high-performance models like SSD (Single Shot Multibox Detector), RetinaNet, FCOS (Fully Convolutional One-Stage Object Detection), and others.
  • Two-stage Methods: Among these are widely used models such as Faster R-CNN, Mask R-CNN, Double-Head R-CNN, and so forth.
  • Multi-stage Methods: For example, Cascade R-CNN and Hybrid Task Cascade.
  • General Modules and Methods: These include mechanisms like Mixed Precision Training, Generalized Attention, Soft NMS (Non-Maximum Suppression), and others.

Architecture

The paper provides a detailed description of the architectural components and training pipeline:

  • Model Components: The model is divided into Backbone, Neck, DenseHead (either Anchor-based or AnchorFree), RoIExtractor, and RoIHead.
  • Training Pipeline: An extensible training pipeline is implemented using a hooking mechanism that allows custom operations at various steps of training, ensuring flexibility and ease of customization.

Benchmarking Results

The extensive benchmarking paper covers different models and their inference speed, memory usage, and performance metrics on the COCO 2017 dataset. It provides comparative results between MMDetection and other prominent codebases (Detectron, maskrcnn-benchmark, and SimpleDet).

  1. Performance: Metrics such as bounding box Average Precision (AP) and mask AP on different backbone architectures, showing the capability to support high-performance detectors.
  2. Training Efficiency: It documents results of mixed precision training, demonstrating efficient memory usage, and maintaining or improving training speed across various models.

Detailed Studies on Hyper-parameters

To provide optimal and reproducible results, MMDetection includes detailed studies on essential hyper-parameters:

  • Regression Losses: Various loss functions like Smooth L1, L1, IoU, and GIoU were evaluated, highlighting each function's impact on the performance of Faster R-CNN.
  • Normalization Layers: Comparing Batch Normalization (BN), Synchronized BN (SyncBN), and Group Normalization (GN), the paper showed that integrating additional convolution layers and appropriate normalization layers could improve model performance.
  • Training Scales: The paper investigates different image resizing strategies for training, including fixed and multi-scale approaches, revealing the impact on object detection performance.

Conclusion and Future Work

MMDetection serves as a robust and flexible platform for object detection research, setting a high standard for benchmarking and experimentation. It provides a substantial foundation for future research in object detection and instance segmentation, allowing researchers to efficiently reimplement existing methods and develop new ones. Given its modularity and comprehensive support for a breadth of current frameworks, it is poised to facilitate ongoing advancements within the computer vision community.

Overall, MMDetection represents a significant contribution to the field, promoting consistency in evaluation and simplifying the implementation of state-of-the-art detection models. Its ongoing development ensures it remains a critical asset for researchers worldwide.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com