Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 147 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 398 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction (2409.02104v2)

Published 3 Sep 2024 in cs.CV

Abstract: Reconstructing scenes and tracking motion are two sides of the same coin. Tracking points allow for geometric reconstruction [14], while geometric reconstruction of (dynamic) scenes allows for 3D tracking of points over time [24, 39]. The latter was recently also exploited for 2D point tracking to overcome occlusion ambiguities by lifting tracking directly into 3D [38]. However, above approaches either require offline processing or multi-view camera setups both unrealistic for real-world applications like robot navigation or mixed reality. We target the challenge of online 2D and 3D point tracking from unposed monocular camera input introducing Dynamic Online Monocular Reconstruction (DynOMo). We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion. Our approach extends 3D Gaussians to capture new content and object motions while estimating camera movements from a single RGB frame. DynOMo stands out by enabling emergence of point trajectories through robust image feature reconstruction and a novel similarity-enhanced regularization term, without requiring any correspondence-level supervision. It sets the first baseline for online point tracking with monocular unposed cameras, achieving performance on par with existing methods. We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  2. Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Transactions on Robotics, 37(6):1874–1890, 2021.
  3. Monoslam: Real-time single camera slam. IEEE transactions on pattern analysis and machine intelligence, 29(6):1052–1067, 2007.
  4. Tap-vid: A benchmark for tracking any point in a video. In NeurIPS, 2022.
  5. TAPIR: tracking any point with per-frame initialization and temporal refinement. In ICCV3, 2023.
  6. Md-splatting: Learning metric deformation from 4d gaussians in highly deformable scenes. CoRR, 2023.
  7. Simultaneous localization and mapping: part i. IEEE robotics & automation magazine, 13(2):99–110, 2006.
  8. Learning affinity functions for image segmentation: Combining patch-based and gradient-based approaches. In CVPR, 2003.
  9. Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5712–5721, 2021.
  10. Dynamic novel-view synthesis: A reality check. In NeurIPS, 2022.
  11. Particle video revisited: Tracking through occlusions using point trajectories. In ECCV, 2022.
  12. Panoptic studio: A massively multiview system for social motion capture. In Proceedings of the IEEE international conference on computer vision, pages 3334–3342, 2015.
  13. CoTracker: It is better to track together. 2023.
  14. Splatam: Splat, track & map 3d gaussians for dense rgb-d slam. In CVPR, 2024.
  15. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 2023.
  16. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  17. Decomposing nerf for editing via feature field distillation. In Advances in Neural Information Processing Systems, 2022.
  18. vmap: Vectorised object mapping for neural field slam. CVPR, 2023.
  19. Dynmf: Neural motion factorization for real-time dynamic view synthesis with 3d gaussian splatting. arXiV, 2023.
  20. Dense RGB slam with neural implicit maps. In ICLR, 2023a.
  21. Neural scene flow fields for space-time view synthesis of dynamic scenes. In CVPR, 2021.
  22. Dynibar: Neural dynamic image-based rendering. In CVPR, 2023b.
  23. Spacetime gaussian feature splatting for real-time dynamic view synthesis. In CVPR, 2024.
  24. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In 3DV, 2024.
  25. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
  26. Gaussian Splatting SLAM. In CVPR, 2024.
  27. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  28. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  29. Dinov2: Learning robust visual features without supervision. ArXiv, 2023.
  30. Dinov2: Learning robust visual features without supervision. Trans. Mach. Learn. Res., 2024.
  31. Nerfies: Deformable neural radiance fields. In ICCV, 2021a.
  32. Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph., 2021b.
  33. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  34. Urban radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12932–12942, 2022.
  35. Particle video: Long-range motion estimation using point trajectories. International journal of computer vision, 80:72–91, 2008.
  36. iMAP: Implicit mapping and positioning in real-time. In ICCV, 2021.
  37. 3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free-viewpoint videos. In ECCV, 2024.
  38. Tracking everything everywhere all at once. In International Conference on Computer Vision, 2023.
  39. Shape of motion: 4d reconstruction from a single video. In arXiv preprint arXiv:2407.13764, 2024.
  40. 4d gaussian splatting for real-time dynamic scene rendering. In CVPR, 2024.
  41. Space-time neural irradiance fields for free-viewpoint video. In CVPR, 2021.
  42. Spatialtracker: Tracking any 2d pixels in 3d space. In CVPR, 2024.
  43. Emernerf: Emergent spatial-temporal scene decomposition via self-supervision. arXiv preprint arXiv:2311.02077, 2023.
  44. Depth anything: Unleashing the power of large-scale unlabeled data. In CVPR, 2024a.
  45. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. In CVPR, 2024b.
  46. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. In ICLR, 2024c.
  47. Pointodyssey: A large-scale synthetic dataset for long-term point tracking. In ICCV, 2023.
  48. Nice-slam: Neural implicit scalable encoding for slam. In CVPR, 2022.
  49. Nicer-slam: Neural implicit scene encoding for rgb slam. In 2024 International Conference on 3D Vision (3DV), pages 42–52. IEEE, 2024.
  50. EWA volume splatting. In 12th IEEE Visualization Conference, IEEE Vis 2001, San Diego, CA, USA, October 24-26, 2001, Proceedings, 2001.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 27 likes.

Upgrade to Pro to view all of the tweets about this paper: