UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles (2104.00946v4)

Published 2 Apr 2021 in cs.CV

Abstract: Human behavior understanding with unmanned aerial vehicles (UAVs) is of great significance for a wide range of applications, which simultaneously brings an urgent demand of large, challenging, and comprehensive benchmarks for the development and evaluation of UAV-based models. However, existing benchmarks have limitations in terms of the amount of captured data, types of data modalities, categories of provided tasks, and diversities of subjects and environments. Here we propose a new benchmark - UAVHuman - for human behavior understanding with UAVs, which contains 67,428 multi-modal video sequences and 119 subjects for action recognition, 22,476 frames for pose estimation, 41,290 frames and 1,144 identities for person re-identification, and 22,263 frames for attribute recognition. Our dataset was collected by a flying UAV in multiple urban and rural districts in both daytime and nighttime over three months, hence covering extensive diversities w.r.t subjects, backgrounds, illuminations, weathers, occlusions, camera motions, and UAV flying attitudes. Such a comprehensive and challenging benchmark shall be able to promote the research of UAV-based human behavior understanding, including action recognition, pose estimation, re-identification, and attribute recognition. Furthermore, we propose a fisheye-based action recognition method that mitigates the distortions in fisheye videos via learning unbounded transformations guided by flat RGB videos. Experiments show the efficacy of our method on the UAV-Human dataset. The project page: https://github.com/SUTDCV/UAV-Human

Citations (165)

View on Semantic Scholar

Summary

The paper presents a comprehensive benchmark dataset with 67,428 multi-modal video sequences and detailed annotations for various UAV-based human behavior tasks.
It introduces a novel Guided Transformer I3D model to counteract fisheye distortion and enhance action recognition accuracy.
The dataset’s diversity in modalities and settings underpins robust research in surveillance, search and rescue, and situational awareness.

UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

The paper presents UAV-Human, a substantial and multifaceted benchmark dataset for evaluating human behavior with Unmanned Aerial Vehicles (UAVs). It addresses deficiencies found in existing benchmarks by introducing a new dataset that offers enhanced diversity in terms of data modalities, task categories, and contextual variability. This benchmark sets the stage for improving the state of UAV-based human behavior understanding, an increasingly relevant area due to the potential applications in fields such as surveillance, search and rescue, and situational awareness in challenging environments.

Dataset Composition and Methodology

The UAV-Human dataset includes a robust and varied collection of data: 67,428 multi-modal video sequences from 119 subjects are available for action recognition, 22,476 frames for pose estimation, 41,290 frames with 1,144 identities for person re-identification, and 22,263 frames for attribute recognition. This extensive dataset was recorded over three months using UAVs in various urban and rural settings, during both daytime and nighttime. Such comprehensive data acquisition ensures wide-ranging environments, subjects, and camera movements, thus covering an array of practical use cases.

Moreover, the dataset leverages different sensors to collect multiple data modalities like RGB, depth, infrared, fisheye, night-vision, and skeleton sequences. This allows researchers to explore diverse approaches to UAV-based human behavior understanding tasks and deep learning techniques to process them.

Technical Highlights

A significant contribution of this research is the proposed fisheye-based action recognition method. The wide-angle distortion that fisheye cameras introduce is a known challenge, and the paper offers a solution by using a Guided Transformer I3D model to counteract these distortions. The model learns unbounded transformations through a guided mechanism leveraging flat RGB videos, which helps improve the action recognition efficacy in fisheye-captured data.

The paper conducts experiments showing promising results with this technique on the UAV-Human dataset, highlighting its capability to enhance recognition accuracy. These findings indicate potential in developing more reliable and robust systems for understanding human behavior from aerial perspectives, which is particularly challenging due to the mobility and varying viewpoints inherent in UAV operations.

Implications and Future Directions

The UAV-Human benchmark is positioned to significantly impact the development and assessment of UAV-based human behavior models by providing a comprehensive, challenging test bed for multiple vision tasks. The proposed dataset will foster advancements in recognizing actions, estimating poses, identifying individuals, and recognizing attributes from UAV data.

Looking forward, improvements in model architectures that further refine the handling of diverse modalities and conditions presented in the dataset will be crucial. Furthermore, exploring methodologies that better fuse multi-modal data might present opportunities for more resilient UAV applications across different environments and tasks. The UAV-Human dataset sets a foundational platform for these next steps, promoting innovation in both theoretical and applied research domains within UAV-based human behavior analysis.

PDF Markdown

Related Papers

GitHub

GitHub - sutdcv/UAV-Human: [CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles (198 stars)