Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes (2302.07676v2)

Published 15 Feb 2023 in cs.CV

Abstract: Cross-view multi-object tracking aims to link objects between frames and camera views with substantial overlaps. Although cross-view multi-object tracking has received increased attention in recent years, existing datasets still have several issues, including 1) missing real-world scenarios, 2) lacking diverse scenes, 3) owning a limited number of tracks, 4) comprising only static cameras, and 5) lacking standard benchmarks, which hinder the investigation and comparison of cross-view tracking methods. To solve the aforementioned issues, we introduce DIVOTrack: a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians in realistic and non-experimental environments. Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available. Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT, which learns object detection, single-view association, and cross-view matching with an all-in-one embedding model. Finally, we present a summary of current methodologies and a set of standard benchmarks with our DIVOTrack to provide a fair comparison and conduct a comprehensive analysis of current approaches and our proposed CrossMOT. The dataset and code are available at https://github.com/shengyuhao/DIVOTrack.

Citations (19)

Summary

  • The paper presents DIVOTrack, a novel dataset for cross-view multi-object tracking featuring diverse scenes, dynamic cameras, and substantial tracking data.
  • The authors propose CrossMOT, a unified joint detection and tracking framework utilizing a decoupled multi-head embedding and conflict-free loss.
  • Evaluation shows CrossMOT achieves superior tracking accuracy on DIVOTrack and other datasets, proving effective in diverse real-world scenarios.

DIVOTrack: Dataset and Baseline for Cross-View Multi-Object Tracking

The paper presents DIVOTrack, an innovative dataset designed to address existing challenges in cross-view multi-object tracking (MOT). The primary objectives are to overcome deficiencies in current datasets by providing diverse scenes, real-world scenarios, dynamic camera movements, and a substantial amount of tracking data. DIVOTrack includes videos captured in various environments like streets, shopping centers, and public spaces, enhancing the diversity and applicability of the dataset.

Dataset Characteristics

DIVOTrack distinguishes itself from existing datasets by incorporating several key features:

  1. Real-world Data: The dataset includes both actors and random passers-by, offering an authentic representation of crowded settings in multiple scenarios.
  2. Diverse Scenes: With fifteen distinct scenarios covering both indoor and outdoor environments, such as parks, shopping malls, and public squares, DIVOTrack offers comprehensive coverage of different tracking environments.
  3. Camera Dynamics: Unlike traditional datasets, DIVOTrack utilizes moving cameras, including cell phones and UAVs, which introduces additional tracking complexity relevant to practical applications.
  4. Extensive Tracking Data: The dataset significantly surpasses others by providing 1,690 single-view and 953 cross-view tracks, presenting a robust platform for algorithm comparison and development.

Baseline Method: CrossMOT

In conjunction with the dataset, the authors propose CrossMOT, a unified framework for joint detection and cross-view tracking. This baseline method leverages a decoupled multi-head embedding architecture to perform detection, single-view tracking, and cross-view tracking concurrently. Notably, CrossMOT employs a conflict-free loss function to address potential ID conflicts due to different embedding tasks—single-view tracking prioritizes temporal continuity, whereas cross-view tracking emphasizes consistent appearance across viewpoints.

CrossMOT Structure

The framework utilizes:

  • Detection Head: Building on CenterNet, it integrates object size and location prediction with confidence scoring.
  • Cross-view and Single-view Re-ID Heads: These heads specialize in extracting features for cross-view matching and single-view associations, respectively, mitigating ID conflicts through tailored loss functions.

Performance and Implications

CrossMOT achieves superior tracking accuracy, outperforming existing methods on DIVOTrack and other well-established datasets, such as CAMPUS and WILDTRACK. This suggests its effectiveness in dealing with dynamic, real-world scenes and validates the utility of decoupled embeddings for multi-task learning within MOT.

Experimentation and Evaluation

The paper conducts thorough experiments, using standardized metrics like HOTA and CVMA, to benchmark various tracking methods. The results highlight the robustness and adaptability of CrossMOT across diverse environments, demonstrating its potential as a foundational model for cross-view MOT research.

Future Directions

The release of DIVOTrack and CrossMOT establishes a benchmark for enhanced evaluation and comparison of cross-view tracking methods. Future work could focus on expanding the dataset across different weather conditions and improving annotations further via segmentation tasks. Additionally, exploring unified detection and tracking frameworks that incorporate spatial-temporal relations remains an open frontier for research.

In summary, the DIVOTrack dataset and the CrossMOT method offer substantial contributions to the field of cross-view multi-object tracking. They promise to facilitate advancements in intelligent surveillance systems and autonomous navigation technologies by providing a realistic testbed for algorithm development and evaluation.