Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve Aerial Visual Perception? (2312.04548v1)
Abstract: Despite the commercial abundance of UAVs, aerial data acquisition remains challenging, and the existing Asia and North America-centric open-source UAV datasets are small-scale or low-resolution and lack diversity in scene contextuality. Additionally, the color content of the scenes, solar-zenith angle, and population density of different geographies influence the data diversity. These two factors conjointly render suboptimal aerial-visual perception of the deep neural network (DNN) models trained primarily on the ground-view data, including the open-world foundational models. To pave the way for a transformative era of aerial detection, we present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives -- ground camera and drone-mounted camera. MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes. This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets across all modalities and tasks. Through our extensive benchmarking on MAVREC, we recognize that augmenting object detectors with ground-view images from the corresponding geographical location is a superior pre-training strategy for aerial detection. Building on this strategy, we benchmark MAVREC with a curriculum-based semi-supervised object detection approach that leverages labeled (ground and aerial) and unlabeled (only aerial) images to enhance the aerial detection. We publicly release the MAVREC dataset: https://mavrec.github.io.
- Color difference. https://en.wikipedia.org/wiki/Color_difference.
- CVAT annotation tool. https://www.cvat.ai.
- PNNL Parking Lot 1 and 2 and Pizza sequences. https://www.crcv.ucf.edu/data/ParkingLOT/.
- The Most Popular Car Color: Can You Guess Which One?, a. https://www.motorbiscuit.com/most-popular-car-color-guess-color/.
- Most popular car-colors by country, b. https://haynes.com/en-us/tips-tutorials/most-popular-car-colors-country-or-don-t-buy-black-car-india.
- Innovation built through partnerships to improve life on the streetscape for all. https://cs3-erc.org/.
- DJI. https://www.dji.com.
- Mapped: The World’s Population Density by Latitude. https://www.visualcapitalist.com/cp/mapped-the-worlds-population-density-by-latitude/.
- Car colour popularity. https://en.wikipedia.org/wiki/Car_colour_popularity, a.
- Solar zenith-angle, b. https://en.wikipedia.org/wiki/Solar_zenith_angle.
- Yolo-NAS. https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md.
- A drone video clip dataset and its applications in automated cinematography. In Computer Graphics Forum, pages 189–203, 2022.
- Real-time on-board detection of components and faults in an autonomous uav system for power line inspection. In Proceedings of the International Conference on Deep Learning Theory and Applications, pages 68–75, 2020.
- Okutama-action: An aerial view video dataset for concurrent human action detection. In Proceedings of the Conference on computer vision and pattern recognition workshops, pages 28–35, 2017.
- End to end learning for self-driving cars. 2016.
- BIRDSAI: A dataset for detection and tracking in aerial thermal infrared videos. In Proceedings of the Winter Conference on Applications of Computer Vision, pages 1747–1756, 2020.
- Au-air: A multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance. In IEEE International Conference on Robotics and Automation, pages 8504–8510, 2020.
- End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, pages 213–229, 2020.
- Dense learning based semi-supervised object detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 4815–4824, 2022.
- Object detection in aerial images: A large-scale benchmark and challenges. IEEE transactions on pattern analysis and machine intelligence, 44(11):7778–7796, 2021.
- The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European Conference on Computer Vision, pages 370–386, 2018.
- Revisiting consistency regularization for semi-supervised learning. International Journal of Computer Vision, 131(3):626–643, 2023.
- S3e: A large-scale multimodal dataset for collaborative slam. 2022.
- Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
- Ross Girshick. Fast R-CNN. In Proceedings of the International Conference on Computer Vision, pages 1440–1448, 2015.
- Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 580–587, 2014.
- Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- The unmanned aerial vehicle benchmark: Object detection, tracking and baseline. International Journal of Computer Vision, 128(5):1141–1159, 2020.
- Drone-based object counting by spatially regularized regional proposal network. In Proceedings of the International Conference on Computer Vision, pages 4145–4153, 2017.
- Effect of signal timing on vehicles’ near misses at intersections. Scientific reports, 131:9065, 2023.
- Computer vision for autonomous UAV flight safety: an overview and a vision-based safe landing pipeline example. ACM Computing Surveys, 54(9):1–37, 2021.
- DroneSURF: Benchmark Dataset for Drone-based Face Recognition. In proceedings of International Conference on Automatic Face and Gesture Recognition, pages 1–7, 2019.
- Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations, 2015.
- Segment anything. arXiv:2304.02643, 2023.
- The highd dataset: A drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pages 2118–2125, 2018.
- The P-DESTRE: A fully annotated dataset for pedestrian detection, tracking, and short/long-term re-identification from aerial devices. IEEE Transactions on Information Forensics and Security, 16:1696–1708, 2020.
- The world by latitudes: A global analysis of human population, development level and environment across the north–south axis over the past half century. Applied geography, 31(2):495–507, 2011.
- Over the hills and further away from coast: global geospatial patterns of human and environment over the 20th–21st centuries. Environmental Research Letters, 11(3):034010, 2016.
- Semi-supervised object detection via multi-instance alignment with global class prototypes. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 9809–9818, 2022a.
- Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976, 2022b.
- Multi-target detection and tracking from a single camera in unmanned aerial vehicles (UAVs). In Proceedings of the International Conference on Intelligent Robots and Systems, pages 4992–4997, 2016.
- Reconstruction of 3D flight trajectories from ad-hoc camera networks. In Proceedings of the International Conference on Intelligent Robots and Systems, pages 1621–1628, 2020.
- Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
- Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision, pages 740–755, 2014.
- Learning deep representations for ground-to-aerial geolocalization. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 5007–5015, 2015.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
- SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, pages 21–37, 2016.
- Unbiased teacher v2: Semi-supervised object detection for anchor-free and anchor-based detectors. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 9819–9828, 2022.
- UAVid: A semantic segmentation dataset for uav imagery. ISPRS journal of photogrammetry and remote sensing, 165:108–119, 2020.
- Air-ground matching: Appearance-based GPS-denied urban localization of micro aerial vehicles. Journal of Field Robotics, 32(7):1015–1039, 2015.
- Mor-UAV: A benchmark dataset and baselines for moving object recognition in UAV videos. In Proceedings of ACM International Conference on Multimedia, pages 2626–2635, 2020.
- RestoreX-AI: A contrastive approach towards guiding image restoration via explainable AI systems. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, pages 3030–3039, 2022.
- Active teacher for semi-supervised object detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 14482–14491, 2022.
- A benchmark and simulator for UAV tracking. In Proceedings of the European Conference on Computer Vision, pages 445–461, 2016.
- New system performs persistent wide-area aerial surveillance. SPIE Newsroom, 5:20–28, 2010.
- A multi-purpose realistic haze benchmark with quantifiable haze levels and ground truth. IEEE Transactions on Image Processing, 32:3481–3492, 2023.
- Yurii Nesterov. Introductory lectures on convex optimization: A basic course. Springer Science and Business Media, 2003.
- A large-scale benchmark dataset for event recognition in surveillance video. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 3153–3160, 2011.
- Beach wrack mapping using unmanned aerial vehicles for coastal environmental management. Ocean and Coastal Management, 213, 2021.
- EyetrackUAV2: A large-scale binocular eye-tracking dataset for UAV videos. Drones, 4(1):2, 2020.
- Carfusion: Combining point tracking and part detection for dynamic 3D reconstruction of vehicles. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 1906–1915, 2018.
- Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of Advances in neural information processing systems, 28, 2015.
- Learning social etiquette: Human trajectory understanding in crowded scenes. In Proceedings of the European Conference on Computer Vision, pages 549–565, 2016.
- Imagenet large scale visual recognition challenge, 2014.
- Accurate geo-registration by ground-to-aerial image matching. In Proceedings of International Conference on 3D Vision, pages 525–532, 2014.
- Curriculum learning: A survey. International Journal of Computer Vision, 130(6):1526–1565, 2022.
- Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Transactions on Circuits and Systems for Video Technology, 32(10):6700–6713, 2022.
- FCOS: A simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(4):1922–1933, 2020.
- Danish airs and grounds: A dataset for aerial-to-street-level place recognition and localization. IEEE Robotics and Automation Letters, 7(4):9207–9214, 2022.
- Seadronessee: A maritime benchmark for detecting humans in open water. In Proceedings of the Winter Conference on Applications of Computer Vision, pages 2260–2270, 2022.
- UAVSwarm dataset: An unmanned aerial vehicle swarm dataset for multiple object tracking. Remote Sensing, 14(11), 2022a.
- Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 13029–13038, 2021.
- YoloV7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 7464–7475, 2023a.
- Omni-DETR: Omni-supervised object detection with transformers. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 9367–9376, 2022b.
- Improved YOLOX-X based UAV aerial photography object detection algorithm. Image and Vision Computing, 135:104697, 2023b.
- Consistent-Teacher: Towards reducing inconsistent pseudo-targets in semi-supervised object detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 3240–3249, 2023c.
- Detection, tracking, and counting meets drones in crowds: A benchmark. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 7808–7817, 2021.
- Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey. IEEE Geoscience and Remote Sensing Magazine, 10(1):91–124, 2022.
- DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 3974–3983, 2018.
- Detecting tiny objects in aerial images: A normalized wasserstein distance and a new benchmark. ISPRS Journal of Photogrammetry and Remote Sensing, 190:79–93, 2022a.
- RFLA: Gaussian receptive field based label assignment for tiny object detection. In Proceedings of the European Conference on Computer Vision, pages 526–543, 2022b.
- DAC-SDC low power object detection challenge for UAV applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(2):392–403, 2021.
- Mixup: Beyond Empirical Risk Minimization. In Proceedings of the International Conference on Learning Representations, 2018.
- An empirical study of multi-scale object detection in high resolution UAV images. Neurocomputing, 421:173–182, 2021.
- Multi-scale and occlusion aware network for vehicle detection and segmentation on uav aerial images. Remote Sensing, 12(11):1760, 2020.
- CitySim: A drone-based vehicle trajectory dataset for safety oriented research and digital twins. arXiv preprint arXiv:2208.11036, 2022.
- Multi-drone-based single object tracking with agent sharing network. IEEE Transactions on Circuits and Systems for Video Technology, 31(10):4058–4070, 2020a.
- Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7380–7399, 2022.
- Deformable DETR: Deformable Transformers for End-to-End Object Detection. In Proceedings of the International Conference on Learning Representations, 2020b.
- TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the International Conference on Computer Vision, pages 2778–2788, 2021.
- Graco: A multimodal dataset for ground and aerial cooperative localization and mapping. IEEE Robotics and Automation Letters, 2023.
- Aritra Dutta (26 papers)
- Srijan Das (35 papers)
- Jacob Nielsen (5 papers)
- Rajatsubhra Chakraborty (4 papers)
- Mubarak Shah (208 papers)