MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video (2312.00778v2)
Abstract: Neural rendering has demonstrated remarkable success in dynamic scene reconstruction. Thanks to the expressiveness of neural representations, prior works can accurately capture the motion and achieve high-fidelity reconstruction of the target object. Despite this, real-world video scenarios often feature large unobserved regions where neural representations struggle to achieve realistic completion. To tackle this challenge, we introduce MorpheuS, a framework for dynamic 360{\deg} surface reconstruction from a casually captured RGB-D video. Our approach models the target scene as a canonical field that encodes its geometry and appearance, in conjunction with a deformation field that warps points from the current frame to the canonical space. We leverage a view-dependent diffusion prior and distill knowledge from it to achieve realistic completion of unobserved regions. Experimental results on various real-world and synthetic datasets show that our method can achieve high-fidelity 360{\deg} surface reconstruction of a deformable object from a monocular RGB-D video.
- Video based reconstruction of 3d people models. In CVPR, 2018.
- Scape: shape completion and animation of people. In ACM TOG. 2005.
- Sal: Sign agnostic learning of shapes from raw data. In CVPR, 2020.
- Neural rgb-d surface reconstruction. In CVPR, 2022.
- Shape-from-template. IEEE TPAMI, 2015.
- A morphable model for the synthesis of 3d faces. In CVPR. 1999.
- Neural non-rigid tracking. NeurIPS, 33, 2020a.
- Deepdeform: Learning non-rigid rgb-d reconstruction with semi-supervised data. In CVPR, 2020b.
- Neural surface reconstruction of dynamic scenes with monocular rgb-d camera. In NeurIPS, 2022.
- Hexplane: A fast representation for dynamic scenes. In CVPR, 2023.
- Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG, 2014.
- Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023.
- Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion. In CVPR, 2021.
- Spacetime surface regularization for neural dynamic scene reconstruction. In ICCV, 2023.
- Fast dynamic radiance fields with time-aware neural voxels. In SIGGRAPH Asia, 2022.
- K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023.
- Dynamic view synthesis from dynamic monocular video. In ICCV, 2021.
- Monocular dynamic view synthesis: A reality check. 2022.
- Dense variational reconstruction of non-rigid surfaces from monocular video. In CVPR, 2013.
- Real-time geometry, albedo, and motion reconstruction using a single rgb-d camera. ACM TOG, 2017.
- threestudio: A unified framework for 3d content generation. https://github.com/threestudio-project/threestudio, 2023.
- Instruct-nerf2nerf: Editing 3d scenes with instructions. In ICCV, 2023.
- Humanrf: High-fidelity neural radiance fields for humans in motion. ACM TOG, 2023.
- Volumedeform: Real-time volumetric non-rigid reconstruction. In ECCV. Springer, 2016.
- Learning category-specific mesh reconstruction from image collections. In ECCV, 2018.
- Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In ICCV, 2019.
- vmap: Vectorised object mapping for neural field slam. In CVPR, 2023.
- Learning a model of facial shape and expression from 4d scans. ACM TOG, 2017.
- Neural scene flow fields for space-time view synthesis of dynamic scenes. In CVPR, 2021.
- Neuralangelo: High-fidelity neural surface reconstruction. In CVPR, 2023a.
- Dynibar: Neural dynamic image-based rendering. In CVPR, 2023b.
- Magic3d: High-resolution text-to-3d content creation. In CVPR, 2023.
- Robust high-resolution video matting with temporal guidance. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022.
- Zero-1-to-3: Zero-shot one image to 3d object. In ICCV, 2023a.
- Syncdreamer: Generating multiview-consistent images from a single-view image. arXiv preprint arXiv:2309.03453, 2023b.
- SMPL: A skinned multi-person linear model. ACM TOG, 2015.
- Realfusion: 360° reconstruction of any object from a single image. In CVPR, 2023.
- Latent-nerf for shape-guided generation of 3d shapes and textures. In CVPR, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG, 2022.
- Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In CVPR, 2015.
- Npms: Neural parametric models for 3d deformable shapes. In ICCV, 2021.
- Avatarstudio: Text-driven editing of 3d dynamic human head avatars. arXiv preprint arXiv:2306.00547, 2023.
- Nerfies: Deformable neural radiance fields. In ICCV, 2021a.
- Hypernerf: a higher-dimensional representation for topologically varying neural radiance fields. ACM TOG, 2021b.
- State of the art on diffusion models for visual computing. arXiv preprint arXiv:2310.07204, 2023.
- Dreamfusion: Text-to-3d using 2d diffusion. In ICLR, 2022.
- D-nerf: Neural radiance fields for dynamic scenes. In CVPR, 2021.
- Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Embodied hands: Modeling and capturing hands and bodies together. ACM TOG, 2017.
- Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 2022.
- Control4d: Dynamic portrait editing by learning 4d gan from 2d diffusion-based editor. arXiv preprint arXiv:2305.20082, 2023a.
- Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In CVPR, 2023b.
- Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. NeurIPS, 2021.
- Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023.
- Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
- Text-to-4d dynamic scene generation. arXiv preprint arXiv:2301.11280, 2023.
- Killingfusion: Non-rigid 3d reconstruction without correspondences. In CVPR, 2017.
- Sobolevfusion: 3d reconstruction of scenes undergoing free non-rigid motion. In CVPR, 2018.
- imap: Implicit mapping and positioning in real-time. In ICCV, 2021.
- Total moving face reconstruction. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13. Springer, 2014.
- Jiaxiang Tang. Stable-dreamfusion: Text-to-3d with stable-diffusion, 2022. https://github.com/ashawkey/stable-dreamfusion.
- Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023.
- Face2face: Real-time face capture and reenactment of rgb videos. In CVPR, 2016.
- Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In CVPR, 2021.
- Articulated mesh animation from multi-view silhouettes. ACM TOG, 2008.
- Neural trajectory fields for dynamic novel view synthesis. arXiv preprint arXiv:2105.05994, 2021a.
- Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023a.
- Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam. In CVPR, 2023b.
- Go-surf: Neural feature grid optimization for fast, high-fidelity rgb-d surface reconstruction. In 3DV, 2022.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. NeurIPS, 2021b.
- Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In ICCV, 2023c.
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. arXiv preprint arXiv:2305.16213, 2023d.
- Realtime performance-based facial animation. ACM TOG, 2011.
- Humannerf: Free-viewpoint rendering of moving people from monocular video. In CVPR, 2022.
- Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360deg views. In CVPR, 2023.
- Banmo: Building animatable 3d neural models from many casual videos. In CVPR, 2022.
- Volume rendering of neural implicit surfaces. In NeurIPS, 2021.
- Direct, dense, and deformable: Template-based non-rigid 3d reconstruction from rgb video. In ICCV, 2015.
- Hifa: High-fidelity text-to-3d with advanced diffusion guidance. arXiv preprint arXiv:2305.18766, 2023.
- Nice-slam: Neural implicit scalable encoding for slam. In CVPR, 2022.
- Real-time non-rigid reconstruction using an rgb-d camera. ACM TOG, 2014.
- 3d menagerie: Modeling the 3d shape and pose of animals. In CVPR, 2017.