TAO-Amodal: A Benchmark for Tracking Any Object Amodally (2312.12433v3)

Published 19 Dec 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Amodal perception, the ability to comprehend complete object structures from partial visibility, is a fundamental skill, even for infants. Its significance extends to applications like autonomous driving, where a clear understanding of heavily occluded objects is essential. However, modern detection and tracking algorithms often overlook this critical capability, perhaps due to the prevalence of \textit{modal} annotations in most benchmarks. To address the scarcity of amodal benchmarks, we introduce TAO-Amodal, featuring 833 diverse categories in thousands of video sequences. Our dataset includes \textit{amodal} and modal bounding boxes for visible and partially or fully occluded objects, including those that are partially out of the camera frame. We investigate the current lay of the land in both amodal tracking and detection by benchmarking state-of-the-art modal trackers and amodal segmentation methods. We find that existing methods, even when adapted for amodal tracking, struggle to detect and track objects under heavy occlusion. To mitigate this, we explore simple finetuning schemes that can increase the amodal tracking and detection metrics of occluded objects by 2.1\% and 3.3\%.

References (62)

Summary

The paper presents a novel dataset, TAO-Amodal, which augments TAO with comprehensive amodal annotations covering occluded, out-of-frame, and fully invisible objects.
It introduces a lightweight, plug-in Amodal Expander module that transforms conventional trackers to predict the full extent of objects.
The method achieves performance gains with a 3.3% boost in detection, a 1.6% improvement in tracking, and a 2x increase for pedestrian tracking.

Introduction to Amodal Perception

Amodal perception is a fundamental skill that allows us to understand the complete structure of partially visible objects. This ability is critical in real-world applications such as autonomous driving, where it's essential to have a clear understanding of objects that are heavily occluded. However, many modern detection and tracking algorithms often overlook the importance of amodal perception, primarily due to the scarcity of datasets with amodal annotations.

The TAO-Amodal Dataset

Addressing the lack of amodal data, a new benchmark dataset termed TAO-Amodal has been introduced, which augments the existing TAO dataset with amodal bounding box annotations. These annotations cover fully invisible, out-of-frame, and occluded objects. TAO-Amodal is significant for its inclusion of 880 diverse categories in thousands of video sequences, providing an extensive evaluation framework for assessing the occlusion reasoning capabilities of current trackers.

Improving Amodal Tracking

To solve the challenge of amodal tracking, an innovative approach has been developed that transforms conventional trackers into amodal trackers. The method leverages a lightweight plug-in module called an "Amodal Expander". This module is fine-tuned on a few hundred video sequences with data augmentation to enhance amodal tracking capabilities, resulting in a 3.3% and 1.6% improvement on the detection and tracking of occluded objects, respectively. Dramatic improvements are seen when evaluated on people, with a 2x improvement in performance compared to state-of-the-art modal baselines.

Amodal Expander Module

The Amodal Expander is a class-agnostic module that expands modal bounding box predictions to cover the full extent of objects, including those that are occluded. It is designed to be lightweight and easily transferrable across different classes. The training process for the expander is based on matching modal box predictions to modal ground truth, then applying regression loss to amodal predictions against the amodal ground truth. This module's introduction has led to significant advancements, particularly in detecting people in amodal tracking scenarios.

Conclusion

The TAO-Amodal dataset and the Amodal Expander represent crucial advancements in the field of object detection and tracking. They bring attention to amodal perception and provide a framework for developing algorithms that better understand occluded and out-of-frame objects. The research showcases that with the right dataset and methodologies, amodal perception can be significantly improved, which is vital for applications requiring an accurate interpretation of the visual world.

PDF Markdown

Tweets

https://twitter.com/22146921/status/1737595690308735334