Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video (2312.00778v2)

Published 1 Dec 2023 in cs.CV

Abstract: Neural rendering has demonstrated remarkable success in dynamic scene reconstruction. Thanks to the expressiveness of neural representations, prior works can accurately capture the motion and achieve high-fidelity reconstruction of the target object. Despite this, real-world video scenarios often feature large unobserved regions where neural representations struggle to achieve realistic completion. To tackle this challenge, we introduce MorpheuS, a framework for dynamic 360{\deg} surface reconstruction from a casually captured RGB-D video. Our approach models the target scene as a canonical field that encodes its geometry and appearance, in conjunction with a deformation field that warps points from the current frame to the canonical space. We leverage a view-dependent diffusion prior and distill knowledge from it to achieve realistic completion of unobserved regions. Experimental results on various real-world and synthetic datasets show that our method can achieve high-fidelity 360{\deg} surface reconstruction of a deformable object from a monocular RGB-D video.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (84)
  1. Video based reconstruction of 3d people models. In CVPR, 2018.
  2. Scape: shape completion and animation of people. In ACM TOG. 2005.
  3. Sal: Sign agnostic learning of shapes from raw data. In CVPR, 2020.
  4. Neural rgb-d surface reconstruction. In CVPR, 2022.
  5. Shape-from-template. IEEE TPAMI, 2015.
  6. A morphable model for the synthesis of 3d faces. In CVPR. 1999.
  7. Neural non-rigid tracking. NeurIPS, 33, 2020a.
  8. Deepdeform: Learning non-rigid rgb-d reconstruction with semi-supervised data. In CVPR, 2020b.
  9. Neural surface reconstruction of dynamic scenes with monocular rgb-d camera. In NeurIPS, 2022.
  10. Hexplane: A fast representation for dynamic scenes. In CVPR, 2023.
  11. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG, 2014.
  12. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023.
  13. Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion. In CVPR, 2021.
  14. Spacetime surface regularization for neural dynamic scene reconstruction. In ICCV, 2023.
  15. Fast dynamic radiance fields with time-aware neural voxels. In SIGGRAPH Asia, 2022.
  16. K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023.
  17. Dynamic view synthesis from dynamic monocular video. In ICCV, 2021.
  18. Monocular dynamic view synthesis: A reality check. 2022.
  19. Dense variational reconstruction of non-rigid surfaces from monocular video. In CVPR, 2013.
  20. Real-time geometry, albedo, and motion reconstruction using a single rgb-d camera. ACM TOG, 2017.
  21. threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
  22. Instruct-nerf2nerf: Editing 3d scenes with instructions. In ICCV, 2023.
  23. Humanrf: High-fidelity neural radiance fields for humans in motion. ACM TOG, 2023.
  24. Volumedeform: Real-time volumetric non-rigid reconstruction. In ECCV. Springer, 2016.
  25. Learning category-specific mesh reconstruction from image collections. In ECCV, 2018.
  26. Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, 2019.
  27. vmap: Vectorised object mapping for neural field slam. In CVPR, 2023.
  28. Learning a model of facial shape and expression from 4d scans. ACM TOG, 2017.
  29. Neural scene flow fields for space-time view synthesis of dynamic scenes. In CVPR, 2021.
  30. Neuralangelo: High-fidelity neural surface reconstruction. In CVPR, 2023a.
  31. Dynibar: Neural dynamic image-based rendering. In CVPR, 2023b.
  32. Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023.
  33. Robust high-resolution video matting with temporal guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022.
  34. Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023a.
  35. Syncdreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023b.
  36. SMPL: A skinned multi-person linear model. ACM TOG, 2015.
  37. Realfusion: 360° reconstruction of any object from a single image. In CVPR, 2023.
  38. Latent-nerf for shape-guided generation of 3d shapes and textures. In CVPR, 2023.
  39. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  40. Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG, 2022.
  41. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In CVPR, 2015.
  42. Npms: Neural parametric models for 3d deformable shapes. In ICCV, 2021.
  43. Avatarstudio: Text-driven editing of 3d dynamic human head avatars. arXiv preprint arXiv:2306.00547, 2023.
  44. Nerfies: Deformable neural radiance fields. In ICCV, 2021a.
  45. Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. ACM TOG, 2021b.
  46. State of the art on diffusion models for visual computing. arXiv preprint arXiv:2310.07204, 2023.
  47. Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2022.
  48. D-nerf: Neural radiance fields for dynamic scenes. In CVPR, 2021.
  49. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
  50. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  51. Embodied hands: Modeling and capturing hands and bodies together. ACM TOG, 2017.
  52. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 2022.
  53. Control4d: Dynamic portrait editing by learning 4d gan from 2d diffusion-based editor. arXiv preprint arXiv:2305.20082, 2023a.
  54. Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In CVPR, 2023b.
  55. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. NeurIPS, 2021.
  56. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023.
  57. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
  58. Text-to-4d dynamic scene generation. arXiv preprint arXiv:2301.11280, 2023.
  59. Killingfusion: Non-rigid 3d reconstruction without correspondences. In CVPR, 2017.
  60. Sobolevfusion: 3d reconstruction of scenes undergoing free non-rigid motion. In CVPR, 2018.
  61. imap: Implicit mapping and positioning in real-time. In ICCV, 2021.
  62. Total moving face reconstruction. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13. Springer, 2014.
  63. Jiaxiang Tang. Stable-dreamfusion: Text-to-3d with stable-diffusion, 2022. https://github.com/ashawkey/stable-dreamfusion.
  64. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
  65. Face2face: Real-time face capture and reenactment of rgb videos. In CVPR, 2016.
  66. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In CVPR, 2021.
  67. Articulated mesh animation from multi-view silhouettes. ACM TOG, 2008.
  68. Neural trajectory fields for dynamic novel view synthesis. arXiv preprint arXiv:2105.05994, 2021a.
  69. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023a.
  70. Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam. In CVPR, 2023b.
  71. Go-surf: Neural feature grid optimization for fast, high-fidelity rgb-d surface reconstruction. In 3DV, 2022.
  72. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. NeurIPS, 2021b.
  73. Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In ICCV, 2023c.
  74. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023d.
  75. Realtime performance-based facial animation. ACM TOG, 2011.
  76. Humannerf: Free-viewpoint rendering of moving people from monocular video. In CVPR, 2022.
  77. Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360deg views. In CVPR, 2023.
  78. Banmo: Building animatable 3d neural models from many casual videos. In CVPR, 2022.
  79. Volume rendering of neural implicit surfaces. In NeurIPS, 2021.
  80. Direct, dense, and deformable: Template-based non-rigid 3d reconstruction from rgb video. In ICCV, 2015.
  81. Hifa: High-fidelity text-to-3d with advanced diffusion guidance. arXiv preprint arXiv:2305.18766, 2023.
  82. Nice-slam: Neural implicit scalable encoding for slam. In CVPR, 2022.
  83. Real-time non-rigid reconstruction using an rgb-d camera. ACM TOG, 2014.
  84. 3d menagerie: Modeling the 3d shape and pose of animals. In CVPR, 2017.
Citations (1)

Summary

We haven't generated a summary for this paper yet.