Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking (2308.15316v3)

Published 29 Aug 2023 in cs.CV and cs.LG

Abstract: Markerless methods for animal posture tracking have been rapidly developing recently, but frameworks and benchmarks for tracking large animal groups in 3D are still lacking. To overcome this gap in the literature, we present 3D-MuPPET, a framework to estimate and track 3D poses of up to 10 pigeons at interactive speed using multiple camera views. We train a pose estimator to infer 2D keypoints and bounding boxes of multiple pigeons, then triangulate the keypoints to 3D. For identity matching of individuals in all views, we first dynamically match 2D detections to global identities in the first frame, then use a 2D tracker to maintain IDs across views in subsequent frames. We achieve comparable accuracy to a state of the art 3D pose estimator in terms of median error and Percentage of Correct Keypoints. Additionally, we benchmark the inference speed of 3D-MuPPET, with up to 9.45 fps in 2D and 1.89 fps in 3D, and perform quantitative tracking evaluation, which yields encouraging results. Finally, we showcase two novel applications for 3D-MuPPET. First, we train a model with data of single pigeons and achieve comparable results in 2D and 3D posture estimation for up to 5 pigeons. Second, we show that 3D-MuPPET also works in outdoors without additional annotations from natural environments. Both use cases simplify the domain shift to new species and environments, largely reducing annotation effort needed for 3D posture tracking. To the best of our knowledge we are the first to present a framework for 2D/3D animal posture and trajectory tracking that works in both indoor and outdoor environments for up to 10 individuals. We hope that the framework can open up new opportunities in studying animal collective behaviour and encourages further developments in 3D multi-animal posture tracking.

Citations (10)

Summary

  • The paper presents a novel framework that accurately estimates and tracks 3D pigeon poses using multi-camera inputs without relying on markers.
  • It integrates dynamic identity matching and triangulation, achieving 9.45 fps in 2D and 1.89 fps in 3D for real-time operation.
  • Results demonstrate high accuracy and adaptability under both controlled indoor and natural outdoor conditions, advancing computer vision in ethology.

Essay on 3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking

The paper "3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking" presents an innovative framework developed for the accurate estimation and tracking of the 3D poses of multiple pigeons using a multi-camera setup. This framework represents a milestone in animal tracking studies as it addresses the gap in frameworks and benchmarks for tracking large groups of animals in 3D without the constraints of marker-based systems. This paper makes several noteworthy contributions to computer vision applications in ethology by providing a platform capable of working both indoors and in natural outdoor environments.

Technical Overview

The core of the proposed framework involves a pose estimation and tracking module that integrates multiple methods for enhanced flexibility and applicability. The framework first estimates 2D keypoints and bounding boxes using a selected model, such as the KeypointRCNN or modified versions of DeepLabCut (DLC) and ViTPose. These 2D outputs from multiple camera views are triangulated to derive 3D keypoints. For identity matching, a dynamic matching algorithm is utilized initially, with SORT then maintaining identities across frames. This allows the system to achieve interactive speeds of up to 9.45 fps in 2D and 1.89 fps in 3D, showcasing its suitability for real-time applications. Notably, the framework compares favorably against the Learnable Triangulation of Human Pose (LToHP), a standard in human 3D pose estimation, displaying comparable median errors and Percentage of Correct Keypoints while benefiting from greater operational speed and flexibility without the need for 3D ground truth data in network training.

Results and Implications

Quantitatively, the framework exhibits robust performance indicators with high accuracy in both 2D and 3D pose estimations. The 3D-MuPPET's evaluations, featuring HOTA and other multi-object tracking metrics, demonstrate strong tracking capabilities with high multi-object tracking accuracy and minimal ID switches, highlighting its utility even in challenging real-world recording conditions. The organizational flexibility in alternating between different pose estimation models based on specific paper requirements accentuates its practical usability in diverse research settings.

Practical Applications and Future Research

The framework is tested under various conditions, including traditional controlled laboratory environments and unstructured outdoor settings, with encouraging outcomes in both. Its ability to shift domain applications, such as using a single-pigeon model to generalize for multi-pigeon tracking in 2D and 3D, underscores the possibility of reducing manual annotation labor, which is particularly advantageous for the extensive standardization required in multi-species behavioral experiments. Moreover, its applicability to outdoor environments without additional annotations opens new avenues for studying animal behavior and movement ecology in more natural contexts.

Further improvements could involve enhanced identity tracking systems that incorporate sophisticated feature-based matching algorithms to address current limitations like the need for all subject presence in the initial frame. Additionally, addressing potential occlusion issues by incorporating advanced segmentation models could refine the robustness of the framework's tracking accuracy.

In conclusion, the introduction of 3D-MuPPET reflects significant progress towards flexible, automated multi-animal tracking systems suitable for both laboratory and field research. It builds an essential bridge between computer vision and ethology, facilitating the detailed paper of complex animal behaviors and group dynamics with unprecedented precision and efficiency. As such, this paper invites further exploration into improving computational tracking strategies and adapting to a broader array of animal species and ecological environments.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com