Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Context-PIPs: Persistent Independent Particles Demands Spatial Context Features (2306.02000v2)

Published 3 Jun 2023 in cs.CV

Abstract: We tackle the problem of Persistent Independent Particles (PIPs), also called Tracking Any Point (TAP), in videos, which specifically aims at estimating persistent long-term trajectories of query points in videos. Previous methods attempted to estimate these trajectories independently to incorporate longer image sequences, therefore, ignoring the potential benefits of incorporating spatial context features. We argue that independent video point tracking also demands spatial context features. To this end, we propose a novel framework Context-PIPs, which effectively improves point trajectory accuracy by aggregating spatial context features in videos. Context-PIPs contains two main modules: 1) a SOurse Feature Enhancement (SOFE) module, and 2) a TArget Feature Aggregation (TAFA) module. Context-PIPs significantly improves PIPs all-sided, reducing 11.4% Average Trajectory Error of Occluded Points (ATE-Occ) on CroHD and increasing 11.8% Average Percentage of Correct Keypoint (A-PCK) on TAP-Vid-Kinectics. Demos are available at https://wkbian.github.io/Projects/Context-PIPs/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. A framework for the robust estimation of optical flow. In 1993 (4th) International Conference on Computer Vision, pages 231–236. IEEE, 1993.
  2. Lucas/kanade meets horn/schunck: Combining local and global optic flow methods. International journal of computer vision, 61(3):211–231, 2005.
  3. Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
  4. Probabilistic and sequential computation of optical flow using temporal coherence. IEEE Transactions on Image Processing, 3(6):773–788, 1994.
  5. Tap-vid: A benchmark for tracking any point in a video. arXiv preprint arXiv:2211.03726, 2022.
  6. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 2758–2766, 2015.
  7. Recursive optical flow estimation—adaptive filtering approach. Journal of Visual Communication and image representation, 9(2):119–138, 1998.
  8. Direct sparse odometry. IEEE transactions on pattern analysis and machine intelligence, 40(3):611–625, 2017.
  9. Lsd-slam: Large-scale direct monocular slam. In European conference on computer vision, pages 834–849. Springer, 2014.
  10. E-raft: Dense optical flow from event cameras. In 2021 International Conference on 3D Vision (3DV), pages 197–206. IEEE, 2021.
  11. Kubric: A scalable dataset generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3749–3761, 2022.
  12. Particle video revisited: Tracking through occlusions using point trajectories. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII, pages 59–75. Springer, 2022.
  13. Determining optical flow. Artificial intelligence, 17(1-3):185–203, 1981.
  14. A dynamic multi-scale voxel flow network for video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6121–6131, 2023.
  15. Videocontrolnet: A motion-guided video-to-video translation framework by using diffusion model with controlnet. arXiv preprint arXiv:2307.14073, 2023.
  16. Neuralmarker: A framework for learning general marker correspondence. ACM Transactions on Graphics (TOG), 41(6):1–10, 2022.
  17. Flowformer: A transformer architecture for optical flow. arXiv preprint arXiv:2203.16194, 2022.
  18. Liteflownet: A lightweight convolutional neural network for optical flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8981–8989, 2018.
  19. A lightweight optical flow cnn—revisiting data fidelity and regularization. IEEE transactions on pattern analysis and machine intelligence, 43(8):2555–2569, 2020.
  20. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2462–2470, 2017.
  21. Learning to estimate hidden motions with global motion aggregation. arXiv preprint arXiv:2104.02409, 2021.
  22. Cotr: Correspondence transformer for matching across images. arXiv preprint arXiv:2103.14167, 2021.
  23. Deep video inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5792–5801, 2019.
  24. Learning degradation representations for image deblurring. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, pages 736–753, Cham, 2022. Springer Nature Switzerland.
  25. Blinkflow: A dataset to push the limits of event-based optical flow estimation. arXiv preprint arXiv:2303.07716, 2023.
  26. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4040–4048, 2016.
  27. Voldor: Visual odometry from log-logistic dense optical flow residuals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4898–4909, 2020.
  28. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4161–4170, 2017.
  29. A fusion approach for multi-frame optical flow estimation. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 2077–2086. IEEE, 2019.
  30. Particle video: Long-range motion estimation using point trajectories. International Journal of Computer Vision, 80:72–91, 2008.
  31. Videoflow: Exploiting temporal cues for multi-frame optical flow estimation. arXiv preprint arXiv:2303.08340, 2023.
  32. Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation. arXiv preprint arXiv:2303.01237, 2023.
  33. Craft: Cross-attentional flow transformer for robust optical flow. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17602–17611, 2022.
  34. A quantitative analysis of current practices in optical flow estimation and the principles behind them. International Journal of Computer Vision, 106(2):115–137, 2014.
  35. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8934–8943, 2018.
  36. Loftr: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8922–8931, 2021.
  37. Skflow: Learning optical flow with super kernels. arXiv preprint arXiv:2205.14623, 2022.
  38. Tracking pedestrian heads in dense crowd. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3865–3875, June 2021.
  39. Raft: Recurrent all-pairs field transforms for optical flow. In European conference on computer vision, pages 402–419. Springer, 2020.
  40. Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems, 34:24261–24272, 2021.
  41. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis. arXiv preprint arXiv:2306.09341, 2023.
  42. Better aligning text-to-image models with human preference. arXiv preprint arXiv:2303.14420, 2023.
  43. Multi-modal neural radiance field for monocular dense slam with a light-weight tof sensor. In International Conference on Computer Vision (ICCV), 2023.
  44. Rethinking self-supervised correspondence learning: A video frame-level similarity perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10075–10085, 2021.
  45. Volumetric correspondence networks for optical flow. Advances in neural information processing systems, 32:794–805, 2019.
  46. Rerender a video: Zero-shot text-guided video-to-video translation. arXiv preprint arXiv:2306.07954, 2023.
  47. Flowfusion: Dynamic dense rgb-d slam based on optical flow. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 7322–7328. IEEE, 2020.
  48. Particlesfm: Exploiting dense point trajectories for localizing moving cameras in the wild. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, pages 523–542. Springer, 2022.
  49. Ligeng Zhu. Thop: Pytorch-opcounter. Lyken17/pytorch-OpCounter, 2019.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com