Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 130 tok/s
Gemini 3.0 Pro 29 tok/s Pro
Gemini 2.5 Flash 145 tok/s Pro
Kimi K2 191 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

YOGA: Deep Object Detection in the Wild with Lightweight Feature Learning and Multiscale Attention (2307.05945v1)

Published 12 Jul 2023 in cs.CV and cs.LG

Abstract: We introduce YOGA, a deep learning based yet lightweight object detection model that can operate on low-end edge devices while still achieving competitive accuracy. The YOGA architecture consists of a two-phase feature learning pipeline with a cheap linear transformation, which learns feature maps using only half of the convolution filters required by conventional convolutional neural networks. In addition, it performs multi-scale feature fusion in its neck using an attention mechanism instead of the naive concatenation used by conventional detectors. YOGA is a flexible model that can be easily scaled up or down by several orders of magnitude to fit a broad range of hardware constraints. We evaluate YOGA on COCO-val and COCO-testdev datasets with other over 10 state-of-the-art object detectors. The results show that YOGA strikes the best trade-off between model size and accuracy (up to 22% increase of AP and 23-34% reduction of parameters and FLOPs), making it an ideal choice for deployment in the wild on low-end edge devices. This is further affirmed by our hardware implementation and evaluation on NVIDIA Jetson Nano.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Hints and the vc dimension. Neural Computation 5, 278–288.
  2. End-to-end object detection with transformers, in: European Conference on Computer Vision, Springer. pp. 213–229.
  3. CodaLab, 2019. CodaLab COCO detection challenge (bounding box). https://competitions.codalab.org/competitions/20794.
  4. Attentional feature fusion, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3560–3569.
  5. Centernet: Keypoint triplets for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578.
  6. Udnet: Uncertainty-aware deep network for salient object detection. Pattern Recognition 134, 109099.
  7. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 .
  8. Fast R-CNN, in: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448.
  9. Rich feature hierarchies for accurate object detection and semantic segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587.
  10. Ghostnet: More features from cheap operations, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589.
  11. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37, 1904–1916.
  12. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
  13. Squeeze-and-excitation networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141.
  14. https://github.com/ultralytics/yolov5. Released version available at the time of evaluation: Feb 22, 2022.
  15. Feature pyramid networks for object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125.
  16. Focal loss for dense object detection, in: ICCV, pp. 2980–2988.
  17. Microsoft coco: Common objects in context, in: European conference on computer vision, Springer. pp. 740–755.
  18. Deep learning for generic object detection: A survey. International Journal of Computer Vision 128, 261–318.
  19. Path aggregation network for instance segmentation, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768.
  20. SSD: Single shot multibox detector, in: European conference on computer vision, Springer. pp. 21–37.
  21. Pp-yolo: An effective and efficient implementation of object detector. arXiv preprint arXiv:2007.12099 .
  22. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 .
  23. Faster r-cnn: Towards real-time object detection with region proposal networks. NeurIPS 28, 91–99.
  24. Houghnet: Integrating near and long-range evidence for bottom-up object detection, in: European Conference on Computer Vision, Springer. pp. 406–423.
  25. Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826.
  26. Efficientdet: Scalable and efficient object detection, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790.
  27. Fcos: Fully convolutional one-stage object detection, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9627–9636.
  28. Scaled-yolov4: Scaling cross stage partial network, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13029–13038.
  29. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 .
  30. Cspnet: A new backbone that can enhance learning capability of cnn, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 390–391.
  31. Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence 43, 3349–3364.
  32. Yolo-anti: Yolo-based counterattack model for unseen congested object detection. Pattern Recognition , 108814.
  33. Overview of deep-learning based methods for salient object detection in videos. Pattern Recognition 104, 107340.
  34. Rdsnet: A new deep architecture forreciprocal object detection and instance segmentation, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12208–12215.
  35. Reppoints: Point set representation for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666.
  36. Multi-scale vision longformer: A new vision transformer for high-resolution image encoding, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2998–3008.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.