YOGA: Deep Object Detection in the Wild with Lightweight Feature Learning and Multiscale Attention (2307.05945v1)
Abstract: We introduce YOGA, a deep learning based yet lightweight object detection model that can operate on low-end edge devices while still achieving competitive accuracy. The YOGA architecture consists of a two-phase feature learning pipeline with a cheap linear transformation, which learns feature maps using only half of the convolution filters required by conventional convolutional neural networks. In addition, it performs multi-scale feature fusion in its neck using an attention mechanism instead of the naive concatenation used by conventional detectors. YOGA is a flexible model that can be easily scaled up or down by several orders of magnitude to fit a broad range of hardware constraints. We evaluate YOGA on COCO-val and COCO-testdev datasets with other over 10 state-of-the-art object detectors. The results show that YOGA strikes the best trade-off between model size and accuracy (up to 22% increase of AP and 23-34% reduction of parameters and FLOPs), making it an ideal choice for deployment in the wild on low-end edge devices. This is further affirmed by our hardware implementation and evaluation on NVIDIA Jetson Nano.
- Hints and the vc dimension. Neural Computation 5, 278–288.
- End-to-end object detection with transformers, in: European Conference on Computer Vision, Springer. pp. 213–229.
- CodaLab, 2019. CodaLab COCO detection challenge (bounding box). https://competitions.codalab.org/competitions/20794.
- Attentional feature fusion, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3560–3569.
- Centernet: Keypoint triplets for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578.
- Udnet: Uncertainty-aware deep network for salient object detection. Pattern Recognition 134, 109099.
- Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 .
- Fast R-CNN, in: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448.
- Rich feature hierarchies for accurate object detection and semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587.
- Ghostnet: More features from cheap operations, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589.
- Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37, 1904–1916.
- Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
- Squeeze-and-excitation networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141.
- https://github.com/ultralytics/yolov5. Released version available at the time of evaluation: Feb 22, 2022.
- Feature pyramid networks for object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125.
- Focal loss for dense object detection, in: ICCV, pp. 2980–2988.
- Microsoft coco: Common objects in context, in: European conference on computer vision, Springer. pp. 740–755.
- Deep learning for generic object detection: A survey. International Journal of Computer Vision 128, 261–318.
- Path aggregation network for instance segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768.
- SSD: Single shot multibox detector, in: European conference on computer vision, Springer. pp. 21–37.
- Pp-yolo: An effective and efficient implementation of object detector. arXiv preprint arXiv:2007.12099 .
- Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 .
- Faster r-cnn: Towards real-time object detection with region proposal networks. NeurIPS 28, 91–99.
- Houghnet: Integrating near and long-range evidence for bottom-up object detection, in: European Conference on Computer Vision, Springer. pp. 406–423.
- Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826.
- Efficientdet: Scalable and efficient object detection, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790.
- Fcos: Fully convolutional one-stage object detection, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9627–9636.
- Scaled-yolov4: Scaling cross stage partial network, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13029–13038.
- Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 .
- Cspnet: A new backbone that can enhance learning capability of cnn, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 390–391.
- Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence 43, 3349–3364.
- Yolo-anti: Yolo-based counterattack model for unseen congested object detection. Pattern Recognition , 108814.
- Overview of deep-learning based methods for salient object detection in videos. Pattern Recognition 104, 107340.
- Rdsnet: A new deep architecture forreciprocal object detection and instance segmentation, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12208–12215.
- Reppoints: Point set representation for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666.
- Multi-scale vision longformer: A new vision transformer for high-resolution image encoding, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2998–3008.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.