Dense Optical Tracking: Connecting the Dots (2312.00786v3)

Published 1 Dec 2023 in cs.CV

Abstract: Recent approaches to point tracking are able to recover the trajectory of any scene point through a large portion of a video despite the presence of occlusions. They are, however, too slow in practice to track every point observed in a single frame in a reasonable amount of time. This paper introduces DOT, a novel, simple and efficient method for solving this problem. It first extracts a small set of tracks from key regions at motion boundaries using an off-the-shelf point tracking algorithm. Given source and target frames, DOT then computes rough initial estimates of a dense flow field and visibility mask through nearest-neighbor interpolation, before refining them using a learnable optical flow estimator that explicitly handles occlusions and can be trained on synthetic data with ground-truth correspondences. We show that DOT is significantly more accurate than current optical flow techniques, outperforms sophisticated "universal" trackers like OmniMotion, and is on par with, or better than, the best point tracking algorithms like CoTracker while being at least two orders of magnitude faster. Quantitative and qualitative experiments with synthetic and real videos validate the promise of the proposed approach. Code, data, and videos showcasing the capabilities of our approach are available in the project webpage: https://16lemoing.github.io/dot .

Citations (18)

View on Semantic Scholar

Summary

The paper introduces Dense Optical Tracking, which integrates sparse track extraction with learnable refinement to generate dense flow fields with enhanced speed and accuracy.
It uses nearest-neighbor interpolation for initial dense estimation before refining motion details through synthetic training, lowering computational costs.
Empirical results demonstrate that DOT outperforms traditional methods by tracking motion two orders of magnitude faster while maintaining or improving accuracy.

Dense Optical Tracking: Connecting the Dots – An Overview

The paper introduces Dense Optical Tracking (DOT), a novel approach for enhancing optical flow estimation by integrating point tracking methods, presenting a marked improvement over existing solutions in both accuracy and computational efficiency. DOT leverages a small set of sparse tracks, which are then extrapolated to produce dense flow fields and visibility masks across video frames using a learnable optical flow estimator. This process promises a significant enhancement in both speed and accuracy over traditional optical flow methods and sophisticated universal trackers.

Core Methodology

The DOT method is predicated on three sequential stages:

Sparse Track Extraction: Initially, a reduced set of point tracks at motion boundaries is derived using existing point tracking algorithms. These tracks form the basis for deducing the broader motion dynamics.
Nearest-neighbor Interpolation: Utilizing these sparse tracks, DOT computes preliminary estimates of the dense flow field and visibility mask through nearest-neighbor interpolation. This initial guess provides a coarse overview of motion and occlusion patterns.
Flow Refinement with Learnable Estimators: The preliminary dense optical flow and visibility estimates are refined through a learnable optical flow model. This model, adaptable through training on synthetic data with corresponding ground truths, explicitly considers occlusions to enhance flow accuracy between source and target frames.

Experimental Findings

Empirical evaluations demonstrate DOT's superior performance, highlighting its precision and speed advantages. Remarkably, the method achieves robust motion tracking at least two orders of magnitude faster than many contemporary point tracking methods, while maintaining alignment or outperforming them in terms of accuracy. On datasets like CVO and TAP, DOT consistently showed improvements in error metrics and achieved a higher intersection over union (IoU) for occlusion regions when compared to traditional optical flow techniques and state-of-the-art point tracking methods like CoTracker.

Implications and Future Directions

The implications of this research are broad within the field of computer vision tasks that require dense motion tracking, such as video editing, motion segmentation, and object tracking across frames. As methods like DOT continue to bridge the gap between point tracking and optical flow, the efficiency and applicability of such algorithms in real-time systems grow significantly.

Looking forward, further developments could involve enhancing DOT's integration with transformer models to better capture long-range dependencies or extending its application to 3D environments, offering a more comprehensive solution for complex motion scenarios. The adaptability of DOT underscores its potential as a versatile tool in both academic and practical applications where precise motion estimation is pivotal.

PDF Markdown

Related Papers

CoTracker: It is Better to Track Together (2023)
Tracking Everything Everywhere All at Once (2023)
TAP-Vid: A Benchmark for Tracking Any Point in a Video (2022)
Fast Optical Flow using Dense Inverse Search (2016)
AllTracker: Efficient Dense Point Tracking at High Resolution (2025)

GitHub

Dense Optical Tracking: Connecting the Dots