Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? (2312.04548v1)

Published 7 Dec 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Despite the commercial abundance of UAVs, aerial data acquisition remains challenging, and the existing Asia and North America-centric open-source UAV datasets are small-scale or low-resolution and lack diversity in scene contextuality. Additionally, the color content of the scenes, solar-zenith angle, and population density of different geographies influence the data diversity. These two factors conjointly render suboptimal aerial-visual perception of the deep neural network (DNN) models trained primarily on the ground-view data, including the open-world foundational models. To pave the way for a transformative era of aerial detection, we present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives -- ground camera and drone-mounted camera. MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes. This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets across all modalities and tasks. Through our extensive benchmarking on MAVREC, we recognize that augmenting object detectors with ground-view images from the corresponding geographical location is a superior pre-training strategy for aerial detection. Building on this strategy, we benchmark MAVREC with a curriculum-based semi-supervised object detection approach that leverages labeled (ground and aerial) and unlabeled (only aerial) images to enhance the aerial detection. We publicly release the MAVREC dataset: https://mavrec.github.io.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Color difference. https://en.wikipedia.org/wiki/Color_difference.
  2. CVAT annotation tool. https://www.cvat.ai.
  3. PNNL Parking Lot 1 and 2 and Pizza sequences. https://www.crcv.ucf.edu/data/ParkingLOT/.
  4. The Most Popular Car Color: Can You Guess Which One?, a. https://www.motorbiscuit.com/most-popular-car-color-guess-color/.
  5. Most popular car-colors by country, b. https://haynes.com/en-us/tips-tutorials/most-popular-car-colors-country-or-don-t-buy-black-car-india.
  6. Innovation built through partnerships to improve life on the streetscape for all. https://cs3-erc.org/.
  7. DJI. https://www.dji.com.
  8. Mapped: The World’s Population Density by Latitude. https://www.visualcapitalist.com/cp/mapped-the-worlds-population-density-by-latitude/.
  9. Car colour popularity. https://en.wikipedia.org/wiki/Car_colour_popularity, a.
  10. Solar zenith-angle, b. https://en.wikipedia.org/wiki/Solar_zenith_angle.
  11. Yolo-NAS. https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md.
  12. A drone video clip dataset and its applications in automated cinematography. In Computer Graphics Forum, pages 189–203, 2022.
  13. Real-time on-board detection of components and faults in an autonomous uav system for power line inspection. In Proceedings of the International Conference on Deep Learning Theory and Applications, pages 68–75, 2020.
  14. Okutama-action: An aerial view video dataset for concurrent human action detection. In Proceedings of the Conference on computer vision and pattern recognition workshops, pages 28–35, 2017.
  15. End to end learning for self-driving cars. 2016.
  16. BIRDSAI: A dataset for detection and tracking in aerial thermal infrared videos. In Proceedings of the Winter Conference on Applications of Computer Vision, pages 1747–1756, 2020.
  17. Au-air: A multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance. In IEEE International Conference on Robotics and Automation, pages 8504–8510, 2020.
  18. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, pages 213–229, 2020.
  19. Dense learning based semi-supervised object detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 4815–4824, 2022.
  20. Object detection in aerial images: A large-scale benchmark and challenges. IEEE transactions on pattern analysis and machine intelligence, 44(11):7778–7796, 2021.
  21. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European Conference on Computer Vision, pages 370–386, 2018.
  22. Revisiting consistency regularization for semi-supervised learning. International Journal of Computer Vision, 131(3):626–643, 2023.
  23. S3e: A large-scale multimodal dataset for collaborative slam. 2022.
  24. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
  25. Ross Girshick. Fast R-CNN. In Proceedings of the International Conference on Computer Vision, pages 1440–1448, 2015.
  26. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 580–587, 2014.
  27. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  28. The unmanned aerial vehicle benchmark: Object detection, tracking and baseline. International Journal of Computer Vision, 128(5):1141–1159, 2020.
  29. Drone-based object counting by spatially regularized regional proposal network. In Proceedings of the International Conference on Computer Vision, pages 4145–4153, 2017.
  30. Effect of signal timing on vehicles’ near misses at intersections. Scientific reports, 131:9065, 2023.
  31. Computer vision for autonomous UAV flight safety: an overview and a vision-based safe landing pipeline example. ACM Computing Surveys, 54(9):1–37, 2021.
  32. DroneSURF: Benchmark Dataset for Drone-based Face Recognition. In proceedings of International Conference on Automatic Face and Gesture Recognition, pages 1–7, 2019.
  33. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, 2015.
  34. Segment anything. arXiv:2304.02643, 2023.
  35. The highd dataset: A drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 2118–2125, 2018.
  36. The P-DESTRE: A fully annotated dataset for pedestrian detection, tracking, and short/long-term re-identification from aerial devices. IEEE Transactions on Information Forensics and Security, 16:1696–1708, 2020.
  37. The world by latitudes: A global analysis of human population, development level and environment across the north–south axis over the past half century. Applied geography, 31(2):495–507, 2011.
  38. Over the hills and further away from coast: global geospatial patterns of human and environment over the 20th–21st centuries. Environmental Research Letters, 11(3):034010, 2016.
  39. Semi-supervised object detection via multi-instance alignment with global class prototypes. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 9809–9818, 2022a.
  40. Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976, 2022b.
  41. Multi-target detection and tracking from a single camera in unmanned aerial vehicles (UAVs). In Proceedings of the International Conference on Intelligent Robots and Systems, pages 4992–4997, 2016.
  42. Reconstruction of 3D flight trajectories from ad-hoc camera networks. In Proceedings of the International Conference on Intelligent Robots and Systems, pages 1621–1628, 2020.
  43. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
  44. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision, pages 740–755, 2014.
  45. Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 5007–5015, 2015.
  46. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
  47. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, pages 21–37, 2016.
  48. Unbiased teacher v2: Semi-supervised object detection for anchor-free and anchor-based detectors. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 9819–9828, 2022.
  49. UAVid: A semantic segmentation dataset for uav imagery. ISPRS journal of photogrammetry and remote sensing, 165:108–119, 2020.
  50. Air-ground matching: Appearance-based GPS-denied urban localization of micro aerial vehicles. Journal of Field Robotics, 32(7):1015–1039, 2015.
  51. Mor-UAV: A benchmark dataset and baselines for moving object recognition in UAV videos. In Proceedings of ACM International Conference on Multimedia, pages 2626–2635, 2020.
  52. RestoreX-AI: A contrastive approach towards guiding image restoration via explainable AI systems. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, pages 3030–3039, 2022.
  53. Active teacher for semi-supervised object detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 14482–14491, 2022.
  54. A benchmark and simulator for UAV tracking. In Proceedings of the European Conference on Computer Vision, pages 445–461, 2016.
  55. New system performs persistent wide-area aerial surveillance. SPIE Newsroom, 5:20–28, 2010.
  56. A multi-purpose realistic haze benchmark with quantifiable haze levels and ground truth. IEEE Transactions on Image Processing, 32:3481–3492, 2023.
  57. Yurii Nesterov. Introductory lectures on convex optimization: A basic course. Springer Science and Business Media, 2003.
  58. A large-scale benchmark dataset for event recognition in surveillance video. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 3153–3160, 2011.
  59. Beach wrack mapping using unmanned aerial vehicles for coastal environmental management. Ocean and Coastal Management, 213, 2021.
  60. EyetrackUAV2: A large-scale binocular eye-tracking dataset for UAV videos. Drones, 4(1):2, 2020.
  61. Carfusion: Combining point tracking and part detection for dynamic 3D reconstruction of vehicles. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 1906–1915, 2018.
  62. Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of Advances in neural information processing systems, 28, 2015.
  63. Learning social etiquette: Human trajectory understanding in crowded scenes. In Proceedings of the European Conference on Computer Vision, pages 549–565, 2016.
  64. Imagenet large scale visual recognition challenge, 2014.
  65. Accurate geo-registration by ground-to-aerial image matching. In Proceedings of International Conference on 3D Vision, pages 525–532, 2014.
  66. Curriculum learning: A survey. International Journal of Computer Vision, 130(6):1526–1565, 2022.
  67. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Transactions on Circuits and Systems for Video Technology, 32(10):6700–6713, 2022.
  68. FCOS: A simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):1922–1933, 2020.
  69. Danish airs and grounds: A dataset for aerial-to-street-level place recognition and localization. IEEE Robotics and Automation Letters, 7(4):9207–9214, 2022.
  70. Seadronessee: A maritime benchmark for detecting humans in open water. In Proceedings of the Winter Conference on Applications of Computer Vision, pages 2260–2270, 2022.
  71. UAVSwarm dataset: An unmanned aerial vehicle swarm dataset for multiple object tracking. Remote Sensing, 14(11), 2022a.
  72. Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 13029–13038, 2021.
  73. YoloV7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 7464–7475, 2023a.
  74. Omni-DETR: Omni-supervised object detection with transformers. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 9367–9376, 2022b.
  75. Improved YOLOX-X based UAV aerial photography object detection algorithm. Image and Vision Computing, 135:104697, 2023b.
  76. Consistent-Teacher: Towards reducing inconsistent pseudo-targets in semi-supervised object detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 3240–3249, 2023c.
  77. Detection, tracking, and counting meets drones in crowds: A benchmark. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 7808–7817, 2021.
  78. Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey. IEEE Geoscience and Remote Sensing Magazine, 10(1):91–124, 2022.
  79. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 3974–3983, 2018.
  80. Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 190:79–93, 2022a.
  81. RFLA: Gaussian receptive field based label assignment for tiny object detection. In Proceedings of the European Conference on Computer Vision, pages 526–543, 2022b.
  82. DAC-SDC low power object detection challenge for UAV applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2):392–403, 2021.
  83. Mixup: Beyond Empirical Risk Minimization. In Proceedings of the International Conference on Learning Representations, 2018.
  84. An empirical study of multi-scale object detection in high resolution UAV images. Neurocomputing, 421:173–182, 2021.
  85. Multi-scale and occlusion aware network for vehicle detection and segmentation on uav aerial images. Remote Sensing, 12(11):1760, 2020.
  86. CitySim: A drone-based vehicle trajectory dataset for safety oriented research and digital twins. arXiv preprint arXiv:2208.11036, 2022.
  87. Multi-drone-based single object tracking with agent sharing network. IEEE Transactions on Circuits and Systems for Video Technology, 31(10):4058–4070, 2020a.
  88. Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7380–7399, 2022.
  89. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In Proceedings of the International Conference on Learning Representations, 2020b.
  90. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the International Conference on Computer Vision, pages 2778–2788, 2021.
  91. Graco: A multimodal dataset for ground and aerial cooperative localization and mapping. IEEE Robotics and Automation Letters, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Aritra Dutta (26 papers)
  2. Srijan Das (35 papers)
  3. Jacob Nielsen (5 papers)
  4. Rajatsubhra Chakraborty (4 papers)
  5. Mubarak Shah (208 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.