Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving (2401.07322v2)

Published 14 Jan 2024 in cs.CV

Abstract: Road scene understanding is crucial in autonomous driving, enabling machines to perceive the visual environment. However, recent object detectors tailored for learning on datasets collected from certain geographical locations struggle to generalize across different locations. In this paper, we present RSUD20K, a new dataset for road scene understanding, comprised of over 20K high-resolution images from the driving perspective on Bangladesh roads, and includes 130K bounding box annotations for 13 objects. This challenging dataset encompasses diverse road scenes, narrow streets and highways, featuring objects from different viewpoints and scenes from crowded environments with densely cluttered objects and various weather conditions. Our work significantly improves upon previous efforts, providing detailed annotations and increased object complexity. We thoroughly examine the dataset, benchmarking various state-of-the-art object detectors and exploring large vision models as image annotators.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. “A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond,” arXiv preprint arXiv:2304.00501, 2023.
  2. “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems, 2015, vol. 28.
  3. “End-to-end object detection with transformers,” in Proc. European Conference on Computer Vision, 2020, pp. 213–229.
  4. “The cityscapes dataset for semantic urban scene understanding,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  5. “The mapillary vistas dataset for semantic understanding of street scenes,” in Proc. IEEE International Conference on Computer Vision, 2017, pp. 4990–4999.
  6. “Are we ready for autonomous driving? the KITTI vision benchmark suite,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2012.
  7. “BDD100K: A diverse driving dataset for heterogeneous multitask learning,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2636–2645.
  8. “Scalability in perception for autonomous driving: Waymo open dataset,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2446–2454.
  9. “Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
  10. “Simple open-vocabulary object detection with vision Transformers,” in Proc. European Conference on Computer Vision, 2022.
  11. “Detecting twenty-thousand classes using image-level supervision,” in Proc. European Conference on Computer Vision, 2022, pp. 350–368.
  12. “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
  13. ASM Shihavuddin and Mohammad Rifat Ahmmad Rashid, “DhakaAI,” in Harvard Dataverse, 2020.
  14. “Densely-populated traffic detection using YOLOv5 and non-maximum suppression ensembling,” in Proc. International Conference on Big Data, IoT, and Machine Learning, 2022, pp. 567–578.
  15. “Poribohon-BD: Bangladeshi local vehicle image dataset with annotation for classification,” Data in Brief, vol. 33, 2020.
  16. “A deep learning based Bangladeshi vehicle classification using fine-tuned multi-class vehicle image network (MVINet) model,” in Proc. International Conference on Next-Generation Computing, IoT and Machine Learning, 2023, pp. 1–6.
  17. “SSD: Single shot multibox detector,” in Proc. European Conference on Computer Vision, 2016, pp. 21–37.
  18. “Focal loss for dense object detection,” in Proc. IEEE International Conference on Computer Vision, 2017, pp. 2980–2988.
  19. “CenterNet: Keypoint triplets for object detection,” in Proc. IEEE International Conference on Computer Vision, 2019, pp. 6569–6578.
  20. “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.
  21. Ross Girshick, “Fast R-CNN,” in Proc. IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.
  22. “Mask R-CNN,” in Proc. IEEE International Conference on Computer Vision, 2017, pp. 2961–2969.
  23. “STAR: noisy semi-supervised transfer learning for visual classification,” in Proc. International Workshop on Multimedia Content Analysis in Sports, 2021, pp. 25–33.
  24. “RTMDet: An empirical study of designing real-time object detectors,” arXiv preprint arXiv:2212.07784, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com