DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling (1703.10295v3)

Published 30 Mar 2017 in cs.CV

Abstract: We define the object detection from imagery problem as estimating a very large but extremely sparse bounding box dependent probability distribution. Subsequently we identify a sparse distribution estimation scheme, Directed Sparse Sampling, and employ it in a single end-to-end CNN based detection model. This methodology extends and formalizes previous state-of-the-art detection models with an additional emphasis on high evaluation rates and reduced manual engineering. We introduce two novelties, a corner based region-of-interest estimator and a deconvolution based CNN model. The resulting model is scene adaptive, does not require manually defined reference bounding boxes and produces highly competitive results on MSCOCO, Pascal VOC 2007 and Pascal VOC 2012 with real-time evaluation rates. Further analysis suggests our model performs particularly well when finegrained object localization is desirable. We argue that this advantage stems from the significantly larger set of available regions-of-interest relative to other methods. Source-code is available from: https://github.com/lachlants/denet

Authors (2)

Lachlan Tychsen-Smith (8 papers)
Lars Petersson (88 papers)

Citations (113)

View on Semantic Scholar

Summary

An Overview of "DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling"

The paper "DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling" introduces DeNet, a convolutional neural network (CNN)-based model that leverages directed sparse sampling for object detection. The research focuses on optimizing both detection performance and real-time evaluation rates, proposing novel methodologies that extend beyond traditional detection paradigms.

Methodological Advancements

The paper defines the object detection task as estimating a sparse bounding box-dependent probability distribution. To address this task, the authors propose Directed Sparse Sampling (DSS) within a singular end-to-end CNN framework. This model distinguishes itself by employing a corner-based region-of-interest (RoI) estimator and a deconvolution-based model. These innovations reduce manual engineering and enhance scene adaptability.

DSS employs a two-stage CNN that first estimates probable locations for potential object detections and then classifies those locations. This approach merges the advantages of sparse region-based methods and dense non-region-based approaches. Notably, the methodology does not rely on manually defined reference bounding boxes, which enhances scalability and adaptability to varied datasets.

Strong Numerical Results

The DeNet model demonstrated competitive detection results on the benchmarks MSCOCO, Pascal VOC 2007, and Pascal VOC 2012. It achieved substantial real-time evaluation rates of up to 83 Hz while maintaining impressive detection accuracies. The DeNet-101 variant specifically attained a mean average precision (MAP) of 31.9% on the MSCOCO test-dev2015 dataset at 34 Hz, outpacing other models with similar or slower evaluation rates. In comparison to state-of-the-art algorithms like SSD and YOLO, DeNet models consistently delivered superior performance, particularly regarding fine-grained object localization.

Theoretical and Practical Implications

The introduction of a corner-based RoI detector and the application of deconvolution layers represent significant contributions to object detection methodologies. These approaches offer a scalable solution for real-time environments without sacrificing detection accuracy. DeNet’s extensive RoI sampling space allows for a much wider selection of potential bounding boxes, enhancing fine localization capabilities compared to dense methods like YOLO and SSD.

Theoretically, the paper advances the understanding of sparsity in object detection, highlighting how sparse distribution estimations can be effectively integrated into deep learning models. Practically, DeNet's reduced dependency on hand-crafted features allows for simpler adaptation to new datasets with varying object sizes and aspect ratios.

Future Directions and Speculations

Based on the results of this paper, potential future work could explore further optimization of the DSS framework, possibly integrating more sophisticated sampling strategies or adaptive learning mechanisms. Additionally, research could focus on minimizing the timing costs related to the CPU-bound generation of RoIs. As AI technology continues to advance, models like DeNet could be adapted for even broader applications, ranging from robotics to autonomous vehicles, where real-time detection with superior localization accuracy is paramount.

In conclusion, this paper makes notable contributions by balancing real-time processing requirements with high detection performance, as evidenced by DeNet’s strong results across prominent object detection benchmarks. Its methodological innovations pave the way for more robust and adaptable object detection systems in real-world applications.

PDF Markdown

Related Papers

YouTube

Show All Videos