Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023 (2403.17994v1)

Published 26 Mar 2024 in cs.CV and cs.LG

Abstract: This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera. To clarify, our approach contains two key components: (1) Multi-granularity Camera Motion Detection, which could identify the video sequence by the static camera shot. (2) CMR-based point trajectory prediction with one moving object segmentation approach to isolate the static point from the moving object. Our approach ranked first in the final test with a score of 0.46.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Tap-vid: A benchmark for tracking any point in a video. In NeurIPS, pages 13610–13626, 2022.
  2. TAPIR: tracking any point with per-frame initialization and temporal refinement. CoRR, abs/2306.08637, 2023.
  3. Cotracker: It is better to track together. CoRR, abs/2307.07635, 2023.
  4. Understanding SSIM. CoRR, abs/2006.13846, 2020.
  5. Tracking everything everywhere all at once. CoRR, abs/2306.05422, 2023.
  6. Towards global video scene segmentation with context-aware transformer. In AAAI, pages 3206–3213. AAAI Press, 2023.
  7. Comprehensive semi-supervised multi-modal learning. In Sarit Kraus, editor, IJCAI, pages 4092–4098, 2019.
  8. DOMFN: A divergence-orientated multi-modal fusion network for resume assessment. In ACM MM, pages 1612–1620. ACM, 2022.
  9. Cost-effective incremental deep model: Matching model capacity with the least sampling. TKDE, 35(4):3575–3588, 2023.
  10. Zoran Zivkovic. Improved adaptive gaussian mixture model for background subtraction. In 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, August 23-26, 2004, pages 28–31. IEEE Computer Society, 2004.
  11. Zoran Zivkovic and Ferdinand van der Heijden. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit. Lett., 27(7):773–780, 2006.
Citations (1)

Summary

We haven't generated a summary for this paper yet.