Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Recent Trends in 3D Reconstruction of General Non-Rigid Scenes (2403.15064v2)

Published 22 Mar 2024 in cs.CV and cs.GR

Abstract: Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision. It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications. It also facilitates the content creation necessary in computer games and AR/VR by avoiding laborious manual design processes. Further, such models are fundamental for intelligent computing systems that need to interpret real-world scenes and actions to act and interact safely with the human world. Notably, the world surrounding us is dynamic, and reconstructing models of dynamic, non-rigidly moving scenes is a severely underconstrained and challenging problem. This state-of-the-art report (STAR) offers the reader a comprehensive summary of state-of-the-art techniques with monocular and multi-view inputs such as data from RGB and RGB-D sensors, among others, conveying an understanding of different approaches, their potential applications, and promising further research directions. The report covers 3D reconstruction of general non-rigid scenes and further addresses the techniques for scene decomposition, editing and controlling, and generalizable and generative modeling. More specifically, we first review the common and fundamental concepts necessary to understand and navigate the field and then discuss the state-of-the-art techniques by reviewing recent approaches that use traditional and machine-learning-based neural representations, including a discussion on the newly enabled applications. The STAR is concluded with a discussion of the remaining limitations and open challenges.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (410)
  1. Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 16610–16620.
  2. Matryodshka: Real-time 6dof video view synthesis using multi-sphere images. In European Conference on Computer Vision (ECCV) (2020), pp. 441–459.
  3. Törf: Time-of-flight radiance fields for dynamic scene view synthesis. In Advances in Neural Information Processing Systems (NeurIPS) (2021), vol. 34, pp. 26289–26301.
  4. Aygün M., Mac Aodha O.: Saor: Single-view articulated object reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  5. Agarwal P., Prabhakaran B.: Robust blind watermarking of point-sampled geometry. IEEE Transactions on Information Forensics and Security 4, 1 (2009), 36–48.
  6. Endomapper dataset of complete calibrated endoscopy procedures. Scientific Data 10, 1 (2023), 671.
  7. Nerd: Neural reflectance decomposition from image collections. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 12684–12694.
  8. Semantic monocular slam for highly dynamic environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2018), IEEE, pp. 393–400.
  9. Flowibr: Leveraging pre-training for efficient neural image-based rendering of dynamic scenes, 2023. arXiv:2309.05418.
  10. Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robotics and Automation Letters (RA-L) 3, 4 (2018), 4076–4083.
  11. Instructpix2pix: Learning to follow image editing instructions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  12. High-quality surface splatting on today’s gpus. In Eurographics/IEEE VGTC Symposium on Point-Based Graphics (PBG) (2005), pp. 17–141. doi:10.1109/PBG.2005.194059.
  13. Fullfusion: A framework for semantic reconstruction of dynamic scenes. In IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) (2019).
  14. Acefusion-accelerated and energy-efficient semantic 3d reconstruction of dynamic scenes. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2022), IEEE, pp. 11063–11070.
  15. Deep relightable appearance models for animatable faces. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–15.
  16. Blum H.: A transformation for extracting new descriptors of shape. In Models for Perception of Speech and Visual Form, Wathen-Dunn W., (Ed.). MIT Press, Cambridge, MA, 1967.
  17. Evdnerf: Reconstructing event data with dynamic neural radiance fields. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (January 2024), pp. 5846–5855.
  18. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 5470–5479.
  19. 3d-aware video generation. Transactions on Machine Learning Research (TMLR) (2023). URL: https://openreview.net/forum?id=SwlfyDq6B3.
  20. Neural non-rigid tracking. In Advances in Neural Information Processing Systems (NeurIPS) (2020), vol. 33, pp. 18727–18737.
  21. Neural deformation graphs for globally-consistent non-rigid reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 1450–1459.
  22. Dynamic FAUST: Registering human bodies in motion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017).
  23. 4d-fy: Text-to-4d generation using hybrid score distillation sampling. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  24. Behave: Dataset and method for tracking human object interactions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (jun 2022).
  25. Flare: Fast learning of animatable and relightable mesh avatars. ACM Transactions on Graphics (ToG) 42 (Dec. 2023), 15. doi:https://doi.org/10.1145/3618401.
  26. Deepdeform: Learning non-rigid rgb-d reconstruction with semi-supervised data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), pp. 7002–7012.
  27. Chang H., Boularias A.: Scene-level tracking and reconstruction without object priors. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2022), IEEE, pp. 3785–3792.
  28. Carroll J. D., Chang J.-J.: Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika 35, 3 (1970), 283–319.
  29. Spacetime surface regularization for neural dynamic scene reconstruction. In IEEE/CVF International Conference on Computer Vision (ICCV) (October 2023), pp. 17871–17881.
  30. Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023. arXiv:2311.14521.
  31. Neural surface reconstruction of dynamic scenes with monocular rgb-d camera. In Advances in Neural Information Processing Systems (NeurIPS) (2022), vol. 35, pp. 967–981.
  32. Single-stage diffusion nerf: A unified approach to 3d generation and reconstruction. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  33. Cao A., Johnson J.: Hexplane: A fast representation for dynamic scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 130–141.
  34. Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In European Conference on Computer Vision (ECCV) (2022).
  35. Fast-snarf: A fast deformer for articulated neural fields. Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 45, 10 (2023), 11796–11809.
  36. Chen Z., Liu Z.: Relighting4d: Neural relightable human from videos. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 606–623.
  37. Efficient geometry-aware 3d generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 16123–16133.
  38. Efficient geometry-aware 3D generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
  39. Dynamic multi-view scene reconstruction using neural implicit surface. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023), IEEE, pp. 1–5.
  40. Deep local shapes: Learning local sdf priors for detailed 3d reconstruction. In European Conference on Computer Vision (ECCV) (2020).
  41. Neusg: Neural implicit surface reconstruction with 3d gaussian splatting guidance, 2023. arXiv:2312.00846.
  42. NeuralEditor: Editing neural radiance fields via manipulating point clouds. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  43. Physics informed neural fields for smoke reconstruction with sparse data. ACM Transactions on Graphics (ToG) 41, 4 (2022), 1–14.
  44. Nerrf: 3d reconstruction and view synthesis for transparent and specular objects with neural refractive-reflective fields, 2023. arXiv:2309.13039.
  45. GeNVS: Generative novel view synthesis with 3D-aware diffusion models. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  46. The isowarp: the template-based visual geometry of isometric surfaces. International Journal of Computer Vision (IJCV) 129, 7 (2021), 2194–2222.
  47. Neural ordinary differential equations. In Advances in Neural Information Processing Systems (NeurIPS) (2018), Bengio S., Wallach H. M., Larochelle H., Grauman K., Cesa-Bianchi N., Garnett R., (Eds.), pp. 6572–6583.
  48. Mono-star: Mono-camera scene-level tracking and reconstruction. In IEEE International Conference on Robotics and Automation (ICRA) (2023), pp. 820–826. doi:10.1109/ICRA48891.2023.10160778.
  49. Chen Q.-A., Tsukada A.: Flow supervised neural radiance fields for static-dynamic decomposition. In IEEE International Conference on Robotics and Automation (ICRA) (2022), IEEE, pp. 10641–10647.
  50. Emerging properties in self-supervised vision transformers. In IEEE/CVF International Conference on Computer Vision (ICCV) (2021).
  51. Virtual elastic objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15827–15837.
  52. Text-to-3d using gaussian splatting, 2023. arXiv:2309.16585.
  53. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision (ECCV) (2022).
  54. Chen Z., Zhang H.: Learning implicit fields for generative shape modeling. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
  55. Snarf: Differentiable forward skinning for animating non-rigid neural implicit shapes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 11594–11604.
  56. Learning multi-object dynamics with compositional neural radiance fields. In Conference on Robot Learning (CoRL) (2023), PMLR, pp. 1755–1768.
  57. De Lathauwer L.: Decompositions of a higher-order tensor in block terms—part ii: Definitions and uniqueness. SIAM Journal on Matrix Analysis and Applications 30, 3 (2008), 1033–1066.
  58. Nasa neural articulated shape approximation. In European Conference on Computer Vision (ECCV) (2020), Springer, pp. 612–628.
  59. Mofusion: A framework for denoising-diffusion-based motion synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2023), pp. 9760–9770.
  60. Gravity-aware monocular 3d human-object reconstruction. In IEEE/CVF International Conference on Computer Vision (ICCV) (2021).
  61. Neural parametric gaussians for monocular non-rigid object reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  62. Neural radiance flow for 4d view synthesis and video processing. In IEEE/CVF International Conference on Computer Vision (ICCV) (2021), IEEE Computer Society, pp. 14304–14314.
  63. Hyperdiffusion: Generating implicit neural fields with weight-space diffusion. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  64. 3d motion magnification: Visualizing subtle motions from time-varying radiance fields. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 9837–9846.
  65. Deepview: View synthesis with learned gradient descent. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
  66. Texture-generic deep shape-from-template. IEEE Access 9 (2021), 75211–75230.
  67. K-planes: Explicit radiance fields in space, time, and appearance. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 12479–12488.
  68. Plenoxels: Radiance fields without neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
  69. Gaussianeditor: Editing 3d gaussians delicately with text instructions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  70. Fast dynamic radiance fields with time-aware neural voxels. In ACM SIGGRAPH Asia (New York, NY, USA, 2022), SA ’22, Association for Computing Machinery. doi:10.1145/3550469.3555383.
  71. Trajectory optimization for physics-based reconstruction of 3d human pose from monocular video. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 13106–13115.
  72. Graßhof S., Brandt S. S.: Tensor-based non-rigid structure from motion. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2022), pp. 3011–3020.
  73. Neurofluid: Fluid dynamics grounding with particle-driven neural radiance fields. In International Conference on Machine Learning (ICML) (2022), PMLR, pp. 7919–7929.
  74. Intrinsic dynamic shape prior for dense non-rigid structure from motion. In International Conference on 3D Vision (3DV) (2020).
  75. Voltemorph: Realtime, controllable and generalisable animation of volumetric representations, 2022. arXiv:2208.00949.
  76. Guédon A., Lepetit V.: Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  77. The relightables: Volumetric performance capture of humans with realistic relighting. ACM Transactions on Graphics (ToG) 38, 6 (2019), 1–19.
  78. Monocular dynamic view synthesis: A reality check. In Advances in Neural Information Processing Systems (NeurIPS) (2022).
  79. Densepose: Dense human pose estimation in the wild. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018).
  80. Compact neural volumetric video representations with dynamic codebooks. In Advances in Neural Information Processing Systems (NeurIPS) (2023), Oh A., Neumann T., Globerson A., Saenko K., Hardt M., Levine S., (Eds.), vol. 36, Curran Associates, Inc., pp. 75884–75895. URL: https://proceedings.neurips.cc/paper_files/paper/2023/file/ef63b00ad8475605b2eaf520747f61d4-Paper-Conference.pdf.
  81. Sd-defslam: Semi-direct monocular slam for deformable and intracorporeal scenes. In IEEE International Conference on Robotics and Automation (ICRA) (2021), IEEE, pp. 5170–5177.
  82. Forward flow for novel view synthesis of dynamic scenes. In IEEE/CVF International Conference on Computer Vision (ICCV) (October 2023), pp. 16022–16033.
  83. Dynamic view synthesis from dynamic monocular video. In IEEE/CVF International Conference on Computer Vision (ICCV) (2021).
  84. Gao W., Tedrake R.: Surfelwarp: Efficient non-volumetric single view dynamic reconstruction. In Robotics: Science and Systems (RSS) (2018).
  85. Hart J. C.: Sphere tracing: a geometric method for the antialiased ray tracing of implicit surfaces. The Visual Computer 12, 10 (Dec. 1996), 527–545.
  86. Non-rigid registration under isometric deformations. Computer Graphics Forum 27, 5 (2008), 1449–1457. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-8659.2008.01285.x, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8659.2008.01285.x, doi:https://doi.org/10.1111/j.1467-8659.2008.01285.x.
  87. Arapreg: An as-rigid-as possible regularization loss for learning deformable shape generators. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5815–5825.
  88. Carto: Category and joint agnostic reconstruction of articulated objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 21201–21210.
  89. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS) (2020), Larochelle H., Ranzato M., Hadsell R., Balcan M., Lin H., (Eds.).
  90. Reconstructing hand-held objects from monocular video. In ACM SIGGRAPH Asia (New York, NY, USA, 2022), SA ’22, Association for Computing Machinery. doi:10.1145/3550469.3555401.
  91. Ditto in the house: Building articulation models of indoor scenes through interactive perception. In IEEE International Conference on Robotics and Automation (ICRA) (2023).
  92. Neuralsim: Augmenting differentiable simulators with neural networks. In IEEE International Conference on Robotics and Automation (ICRA) (2021), IEEE, pp. 9474–9481.
  93. Fast tetrahedral meshing in the wild. ACM Transactions on Graphics (ToG) 39, 4 (2020), 117–1.
  94. Instruct-nerf2nerf: Editing 3d scenes with instructions. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  95. Semantic view synthesis. In European Conference on Computer Vision (ECCV) (2020), pp. 592–608.
  96. SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
  97. Physical interaction: Reconstructing hand-object interactions with physics. In ACM SIGGRAPH Asia (New York, NY, USA, 2022), SA ’22, Association for Computing Machinery. doi:10.1145/3550469.3555421.
  98. Hartley R., Zisserman A.: Multiple view geometry in computer vision. Cambridge university press, 2003.
  99. Rana: Relightable articulated neural avatars. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 23142–23153.
  100. Optical non-line-of-sight physics-based 3d human pose estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
  101. VolumeDeform: Real-time Volumetric Non-rigid Reconstruction. In European Conference on Computer Vision (ECCV) (2016).
  102. Unbiased 4d: Monocular 4d reconstruction with a neural deformation model. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) (2023), pp. 6597–6606.
  103. Shapeflow: Learnable deformation flows among 3d shapes. Advances in Neural Information Processing Systems (NeurIPS) 33 (2020), 9745–9757.
  104. Neuralhofusion: Neural volumetric rendering under human-object interactions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 6155–6165.
  105. Screwnet: Category-independent articulation model estimation from depth images using screw theory. In IEEE International Conference on Robotics and Automation (ICRA) (2021), IEEE, pp. 13670–13677.
  106. Tensoir: Tensorial inverse rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2023), pp. 165–174.
  107. Tensoir: Tensorial inverse rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 165–174.
  108. Harmonic coordinates for character articulation. ACM Transactions on Graphics (ToG) 26, 3 (2007), 71–es.
  109. Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018), Computer Vision Foundation / IEEE Computer Society, pp. 9000–9008.
  110. The material point method for simulating continuum materials. In ACM SIGGRAPH 2016 Courses. 2016, pp. 1–52.
  111. Mean value coordinates for closed triangular meshes. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 2023, pp. 223–228.
  112. Neuman: Neural human radiance field from a single video. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 402–418.
  113. Instant-nvr: Instant neural volumetric rendering for human-object interactions from monocular rgbd stream. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 595–605.
  114. Learning compositional representation for 4d captures with neural ode. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5340–5350.
  115. Dreamhuman: Animatable 3d avatars from text, 2023. arXiv:2306.09329.
  116. Fast non-rigid radiance fields from monocularized data. IEEE Transactions on Visualization and Computer Graphics (TVCG) (2024).
  117. Panoptic neural fields: A semantic object-aware neural scene representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 12871–12881.
  118. Lerf: Language embedded radiance fields. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 19729–19739.
  119. Splatam: Splat, track & map 3d gaussians for dense rgb-d slam, 2023. arXiv:2312.02126.
  120. Camm: Building category-agnostic and animatable 3d models from monocular videos. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 6586–6596.
  121. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG) 42, 4 (2023), 1–14.
  122. Geometric modeling in shape space. ACM Transactions on Graphics (ToG) 26, 3 (jul 2007), 64–es. URL: https://doi.org/10.1145/1276377.1276457, doi:10.1145/1276377.1276457.
  123. Decomposing nerf for editing via feature field distillation. In Advances in Neural Information Processing Systems (NeurIPS) (2022).
  124. f-sft: Shape-from-template with a physics-based deformation model. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 3948–3958.
  125. Kumar S., Van Gool L.: Organic priors in non-rigid structure from motion. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 71–88.
  126. Holodiffusion: Training a 3D diffusion model using 2D images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  127. Kim B., Ye J. C.: Diffusion deformable model for 4d temporal medical image generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2022), Springer, pp. 539–548.
  128. Conerf: Controllable neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 18623–18632.
  129. D &d: Learning human dynamics from dynamic camera. In European Conference on Computer Vision (ECCV) (2022).
  130. Lorensen W. E., Cline H. E.: Marching cubes: A high resolution 3d surface construction algorithm. In ACM SIGGRAPH (New York, NY, USA, 1987), SIGGRAPH ’87, Association for Computing Machinery, p. 163–169. URL: https://doi.org/10.1145/37401.37422, doi:10.1145/37401.37422.
  131. Flow matching for generative modeling. In International Conference on Learning Representations (ICLR) (2023).
  132. 3d vision with transformers: A survey, 2022. arXiv:2208.04309.
  133. Spacetime gaussian feature splatting for real-time dynamic view synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  134. Devrf: Fast deformable voxel radiance fields for dynamic scenes. In Advances in Neural Information Processing Systems (NeurIPS) (2022), vol. 35, pp. 36762–36775.
  135. Pointinet: Point cloud frame interpolation network. In AAAI Conference on Artificial Intelligence (2021), vol. 35, pp. 2251–2259.
  136. Dynvideo-e: Harnessing dynamic nerf for large-scale motion- and view-change human-centric video editing. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  137. Hosnerf: Dynamic human-object-scene neural radiance fields from a single video. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 18483–18494.
  138. Repaint: Inpainting using denoising diffusion probabilistic models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
  139. Nap: Neural 3d articulated object prior. In Advances in Neural Information Processing Systems (NeurIPS) (2023).
  140. Li C., Guo X.: Topology-change-aware volumetric fusion for dynamic scene reconstruction. In European Conference on Computer Vision (ECCV) (2020), Springer, pp. 258–274.
  141. Robust dynamic radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 13–23.
  142. Control-nerf: Editable feature volumes for scene rendering and manipulation. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2023), pp. 4340–4350.
  143. Nerfacc: Efficient sampling accelerates nerfs. In IEEE/CVF International Conference on Computer Vision (ICCV) (Los Alamitos, CA, USA, oct 2023), IEEE Computer Society, pp. 18491–18500. URL: https://doi.ieeecomputersociety.org/10.1109/ICCV51070.2023.01699, doi:10.1109/ICCV51070.2023.01699.
  144. Building rearticulable models for arbitrary 3d objects from 4d point clouds. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 21138–21147.
  145. Neural sparse voxel fields. In Advances in Neural Information Processing Systems (NeurIPS) (2020), vol. 33, pp. 15651–15663.
  146. Neural actor: Neural free-view synthesis of human actors with pose control. ACM Transactions on Graphics (ToG) 40, 6 (2021), 1–16.
  147. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In International Conference on 3D Vision (3DV) (2024).
  148. Lin A., Li J.: Dynamic appearance particle neural radiance field, 2023. arXiv:2310.07916.
  149. Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In IEEE/CVF International Conference on Computer Vision (ICCV) (2019).
  150. Differentiable cloth simulation for inverse problems. Advances in Neural Information Processing Systems (NeurIPS) 32 (2019).
  151. Learning the 3d fauna of the web. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  152. Semantic attention flow fields for dynamic scene decomposition. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  153. 3d neural scene representations for visuomotor control. In Conference on Robot Learning (CoRL) (2022), PMLR, pp. 112–123.
  154. Lamarca J., Montiel J. M. M.: Camera tracking for slam in deformable maps. In European Conference on Computer Vision Workshop (ECCVW) (2018).
  155. PARIS: Part-level reconstruction and motion analysis for articulated objects. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  156. Eyenerf: a hybrid representation for photorealistic synthesis, animation and relighting of human eyes. ACM Transactions on Graphics (ToG) 41, 4 (2022), 1–16.
  157. Smpl: a skinned multi-person linear model. ACM Transactions on Graphics (ToG) 34, 6 (oct 2015). URL: https://doi.org/10.1145/2816795.2818013, doi:10.1145/2816795.2818013.
  158. Neural scene flow fields for space-time view synthesis of dynamic scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 6498–6508.
  159. Defslam: Tracking and mapping of deforming scenes from monocular sequences. IEEE Transactions on Robotics (T-RO) 37, 1 (2020), 291–303.
  160. A 128×\times× 128 120 db 15 μ𝜇\muitalic_μs latency asynchronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits 43, 2 (2008), 566–576.
  161. Neural rays for occlusion-aware image-based rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
  162. Efficient neural radiance fields for interactive free-viewpoint video. In ACM SIGGRAPH Asia (New York, NY, USA, 2022), SA ’22, Association for Computing Machinery. doi:10.1145/3550469.3555376.
  163. High-fidelity and real-time novel view synthesis for dynamic scenes. In ACM SIGGRAPH Asia (New York, NY, USA, 2023), SA ’23, Association for Computing Machinery. doi:10.1145/3610548.3618142.
  164. Pac-nerf: Physics augmented continuum neural radiance fields for geometry-agnostic system identification. In International Conference on Learning Representations (ICLR) (2022).
  165. Flownet3d: Learning scene flow in 3d point clouds. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
  166. Mocapdeform: Monocular 3d human motion capture in deformable scenes. In International Conference on 3D Vision (3DV) (2022).
  167. Streaming radiance fields for 3d video synthesis. In Advances in Neural Information Processing Systems (NeurIPS) (2022), vol. 35, pp. 13485–13498.
  168. Neural 3d video synthesis from multi-view video. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 5521–5531.
  169. 4dcomplete: Non-rigid motion estimation beyond the observable surface. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 12706–12716.
  170. Tava: Template-free animatable volumetric actors. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 419–436.
  171. Dynibar: Neural dynamic image-based rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 4273–4284.
  172. Neural scene chronology. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 20752–20761.
  173. Gart: Gaussian articulated template models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  174. Capturing relightable human performances under general uncontrolled illumination. In Computer Graphics Forum (2013), vol. 32, Wiley Online Library, pp. 275–284.
  175. Zero-1-to-3: Zero-shot one image to 3d object. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 9298–9309.
  176. Crowdsampling the plenoptic function. In European Conference on Computer Vision (ECCV) (2020).
  177. Deep 3d mask volume for view synthesis of dynamic scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 1749–1758.
  178. Articulatedfusion: Real-time reconstruction of motion, geometry and segmentation using a single depth camera. In European Conference on Computer Vision (ECCV) (2018), pp. 317–332.
  179. Splitfusion: Simultaneous tracking and mapping for non-rigid scenes. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2020), IEEE, pp. 5128–5134.
  180. Occlusionfusion: Occlusion-aware motion estimation for real-time dynamic 3d reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 1736–1745.
  181. Mohamed M., Agapito L.: Gnpm: Geometric-aware neural parametric models. In International Conference on 3D Vision (3DV) (2022), IEEE, pp. 166–175.
  182. Mohamed M., Agapito L.: Dynamicsurf: Dynamic neural rgb-d surface reconstruction with an optimizable feature grid. In International Conference on 3D Vision (3DV) (2024).
  183. Livehand: Real-time and photorealistic neural hand rendering. In IEEE/CVF International Conference on Computer Vision (ICCV) (October 2023).
  184. Real-time pose and shape reconstruction of two interacting hands with a single depth camera. ACM Transactions on Graphics (ToG) 38, 4 (2019), 1–13.
  185. Meagher D.: Octree Encoding: A New Technique for the Representation, Manipulation and Display of Arbitrary 3-D Objects by Computer. Tech. rep., 10 1980.
  186. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41, 4 (2022), 102:1–102:15.
  187. P⁢C2𝑃superscript𝐶2PC^{2}italic_P italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT projection-conditioned point cloud diffusion for single-image 3d reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  188. 3d video loops from asynchronous input. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 310–320.
  189. 3d pose estimation of two interacting hands from a monocular event camera. In International Conference on 3D Vision (3DV) (2024).
  190. Transfer4d: A framework for frugal motion capture and deformation transfer. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 12836–12846.
  191. Occupancy networks: Learning 3d reconstruction in function space. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
  192. Deformable neural radiance fields using rgb and event cameras. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 3590–3600.
  193. Avatarstudio: Text-driven editing of 3d dynamic human head avatars. ACM Transactions on Graphics (ToG) 42, 6 (dec 2023). URL: https://doi.org/10.1145/3618368, doi:10.1145/3618368.
  194. A-sdf: Learning disentangled signed distance functions for articulated shape representation. In IEEE/CVF International Conference on Computer Vision (ICCV) (2021).
  195. Promptable game models: Text-guided game simulation via masked diffusion models. ACM Transactions on Graphics (ToG) 43, 2 (jan 2024). URL: https://doi.org/10.1145/3635705, doi:10.1145/3635705.
  196. Diffrf: Rendering-guided 3d radiance field diffusion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  197. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision (ECCV) (2020).
  198. Three-dimensional tissue deformation recovery and tracking. IEEE Signal Processing Magazine 27, 4 (2010), 14–24. doi:10.1109/MSP.2010.936728.
  199. Multi-object monocular slam for dynamic environments. In 2020 IEEE Intelligent Vehicles Symposium (IV) (2020), IEEE, pp. 651–657.
  200. Nesterov Y. E.: A method for solving the convex programming problem with convergence rate O⁢(1k2)𝑂1superscript𝑘2O\bigl{(}\frac{1}{k^{2}}\bigr{)}italic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ). In Doklady Akademii Nauk SSSR (1983), vol. 269, pp. 543–547.
  201. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2015), pp. 343–352.
  202. Niemeyer M., Geiger A.: Giraffe: Representing scenes as compositional generative neural feature fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 11453–11464.
  203. Watch it move: Unsupervised discovery of 3d joints for re-posing of articulated objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 3677–3687.
  204. Occupancy flow: 4d reconstruction by learning particle dynamics. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), pp. 5379–5389.
  205. Continuous surface embeddings. Advances in Neural Information Processing Systems (NeurIPS) 33 (2020), 17258–17270.
  206. Keytr: keypoint transporter for 3d reconstruction of deformable objects in videos. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 5595–5604.
  207. Unsupervised learning of efficient geometry-aware neural articulated representations. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 597–614.
  208. Real-time 3d reconstruction at scale using voxel hashing. ACM Transactions on Graphics (ToG) 32, 6 (2013), 1–11.
  209. Neural scene graphs for dynamic scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 2856–2865.
  210. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In IEEE/CVF International Conference on Computer Vision (ICCV) (2021).
  211. Npms: Neural parametric models for 3d deformable shapes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 12695–12705.
  212. D-nerf: Neural radiance fields for dynamic scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), Computer Vision Foundation / IEEE, pp. 10318–10327.
  213. Deepsdf: Learning continuous signed distance functions for shape representation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019).
  214. Phong B. T.: Illumination for computer generated pictures. Communications of the ACM 18, 6 (jun 1975), 311–317. doi:10.1145/360825.360839.
  215. Dreamfusion: Text-to-3d using 2d diffusion. In International Conference on Learning Representations (ICLR) (2022).
  216. Park B., Kim C.: Point-dynrf: Point-based dynamic radiance fields from a monocular video. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (January 2024), pp. 3171–3181.
  217. Dynamic point fields. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  218. Convolutional occupancy networks. In European Conference on Computer Vision (ECCV) (2020), Springer, pp. 523–540.
  219. Isometric non-rigid shape-from-motion with riemannian geometry solved in linear time. Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 40, 10 (2017), 2442–2454.
  220. Robust isometric non-rigid structure-from-motion. Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 44, 10 (2021), 6409–6423.
  221. Modalnerf: Neural modal analysis and synthesis for free-viewpoint navigation in dynamically vibrating scenes. Computer Graphics Forum 42, 4 (2023), e14888. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.14888, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14888, doi:https://doi.org/10.1111/cgf.14888.
  222. Nerfies: Deformable neural radiance fields. In IEEE/CVF International Conference on Computer Vision (ICCV) (2021), IEEE, pp. 5845–5854.
  223. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM Transactions on Graphics (ToG) 40, 6 (dec 2021).
  224. Temporal interpolation is all you need for dynamic neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 4212–4221.
  225. Spams: Structured implicit parametric models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 12851–12860.
  226. State of the art on diffusion models for visual computing, 2023. arXiv:2310.07204.
  227. Cagenerf: Cage-based neural radiance field for generalized 3d deformation and animation. In Advances in Neural Information Processing Systems (NeurIPS) (2022), vol. 35, pp. 31402–31415.
  228. Representing volumetric videos as dynamic mlp maps. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 4252–4262.
  229. Measuring robustness of visual slam. In International Conference on Machine Vision Applications (MVA) (2019), IEEE, pp. 1–6.
  230. Ash: Animatable gaussian splats for efficient and photoreal human rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  231. Surfels: surface elements as rendering primitives. In ACM SIGGRAPH (USA, 2000), SIGGRAPH ’00, ACM Press/Addison-Wesley Publishing Co., p. 335–342. URL: https://doi.org/10.1145/344779.344936, doi:10.1145/344779.344936.
  232. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 9054–9063.
  233. Novel-view synthesis and pose estimation for hand-object interaction from sparse views. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 15100–15111.
  234. Neuphysics: Editable neural geometry and physics from monocular videos. In Advances in Neural Information Processing Systems (NeurIPS) (2022), vol. 35, pp. 12841–12854.
  235. Langsplat: 3d language gaussian splatting. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  236. On the spectral bias of neural networks. In International Conference on Machine Learning (ICML) (2019), PMLR, pp. 5301–5310.
  237. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 10684–10695.
  238. Robust visibility surface determination in object space via plücker coordinates. Journal of Imaging 7, 6 (2021). URL: https://www.mdpi.com/2313-433X/7/6/96, doi:10.3390/jimaging7060096.
  239. Caspr: Learning canonical spatiotemporal point cloud representations. In Advances in Neural Information Processing Systems (NeurIPS) (2020), vol. 33, pp. 13688–13701.
  240. Eventnerf: Neural radiance fields from a single colour event camera. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  241. Eventhands: Real-time neural 3d hand pose estimation from an event stream. In IEEE/CVF International Conference on Computer Vision (ICCV) (2021).
  242. Dynamic ct reconstruction from limited views with implicit neural representations and parametric motion fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 2258–2268.
  243. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 44, 3 (2020), 1623–1637.
  244. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 22500–22510.
  245. Nr-slam: Non-rigid monocular slam, 2023. arXiv:2308.04036.
  246. Object space ewa surface splatting: A hardware accelerated approach to high quality point rendering. In Eurographics (September 2002), vol. 21, pp. 461 – 470.
  247. Blirf: Bandlimited radiance fields for dynamic scene modeling, 2023. arXiv:2302.13543.
  248. Class-agnostic reconstruction of dynamic objects from videos. In Advances in Neural Information Processing Systems (NeurIPS) (2021), vol. 34, pp. 509–522.
  249. Sorkine O., Alexa M.: As-rigid-as-possible surface modeling. In Symposium on Geometry Processing (SGP) (Goslar, DEU, 2007), SGP ’07, Eurographics Association, p. 109–116.
  250. Sifakis E., Barbic J.: Fem simulation of 3d deformable solids: a practitioner’s guide to theory, discretization and model reduction. In ACM SIGGRAPH 2012 Courses. 2012, pp. 1–50.
  251. Killingfusion: Non-rigid 3d reconstruction without correspondences. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 1386–1395.
  252. Sobolevfusion: 3d reconstruction of scenes undergoing free non-rigid motion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018), pp. 2646–2655.
  253. Variational level set evolution for non-rigid 3d reconstruction from a single depth camera. Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 43, 8 (2020), 2838–2850.
  254. Danbo: Disentangled articulated neural body representations via graph neural networks. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 107–124.
  255. Npc: Neural point characters from video. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  256. Moda: Modeling deformable 3d objects from casual videos, 2023. arXiv:2304.08279.
  257. Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields. IEEE Transactions on Visualization and Computer Graphics (TVCG) 29, 5 (2023), 2732–2742.
  258. Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 7495–7504.
  259. Light field neural rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
  260. Schonberger J. L., Frahm J.-M.: Structure-from-motion revisited. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 4104–4113.
  261. A shape completion component for monocular non-rigid slam. In IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct) (2019), IEEE, pp. 332–337.
  262. Pref: Predictability regularized neural motion fields. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 664–681.
  263. Decaf: Monocular deformation capture for face and hand interactions. ACM Transactions on Graphics (ToG) 42, 6 (2023).
  264. IsMo-GAN: Adversarial learning for monocular non-rigid 3d reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) (2019).
  265. Neural monocular 3d human motion capture with physical awareness. ACM Transactions on Graphics (ToG) 40, 4 (2021).
  266. Physcap: Physically plausible monocular 3d motion capture in real time. ACM Transactions on Graphics (ToG) 39, 6 (2020).
  267. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis. In Advances in Neural Information Processing Systems (NeurIPS) (2021).
  268. Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In IEEE/CVF International Conference on Computer Vision (ICCV) (2019), pp. 2304–2314.
  269. Acid: Action-conditional implicit visual dynamics for deformable object manipulation. In Robotics: Science and Systems (RSS) (2022).
  270. 3dgstream: On-the-fly training of 3d gaussians for efficient streaming of photo-realistic free-viewpoint videos. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  271. Sederberg T. W., Parry S. R.: Free-form deformation of solid geometric models. In ACM SIGGRAPH (New York, NY, USA, 1986), SIGGRAPH ’86, Association for Computing Machinery, p. 151–160. doi:10.1145/15922.15903.
  272. Sclaroff S., Pentland A.: Generalized implicit functions for computer graphics. In ACM SIGGRAPH (New York, NY, USA, 1991), SIGGRAPH ’91, Association for Computing Machinery, p. 247–250. URL: https://doi.org/10.1145/122718.122745, doi:10.1145/122718.122745.
  273. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR) (2021).
  274. Voxgraf: Fast 3d-aware image synthesis with sparse voxel grids. In Advances in Neural Information Processing Systems (NeurIPS) (2022).
  275. Embedded deformation for shape manipulation. ACM Transactions on Graphics (ToG) 26, 3 (jul 2007), 80–es. URL: https://doi.org/10.1145/1276377.1276478, doi:10.1145/1276377.1276478.
  276. Text-to-4d dynamic scene generation. In International Conference on Machine Learning (ICML) (2023), ICML’23, JMLR.org.
  277. Common pets in 3d: Dynamic new-view synthesis of real-life deformable categories. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 4881–4891.
  278. Pushing the boundaries of view extrapolation with multiplane images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019).
  279. Neural dense non-rigid structure from motion with latent space constraints. In European Conference on Computer Vision (ECCV) (2020), Springer, pp. 204–222.
  280. Physics-guided shape-from-template: Monocular video perception through neural surrogate models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  281. Constraining dense hand surface tracking with elasticity. ACM Transactions on Graphics (ToG) 39, 6 (2020), 1–14.
  282. Robustfusion: Robust volumetric performance reconstruction under human-object interactions from monocular rgbd stream. Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2022).
  283. Total-recon: Deformable scene reconstruction for embodied view synthesis. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  284. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV) (2016), Springer, pp. 501–518.
  285. Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 16632–16642.
  286. Scene representation networks: Continuous 3d-structure-aware neural scene representations. In Advances in Neural Information Processing Systems (NeurIPS) (2019), vol. 32.
  287. Tsoli A., Argyros A. A.: Joint 3d tracking of a deformable object in interaction with a hand. In European Conference on Computer Vision (ECCV) (2018).
  288. Seeing people in different light-joint shape, motion, and reflectance capture. IEEE Transactions on Visualization and Computer Graphics (TVCG) 13, 4 (2007), 663–674.
  289. Teed Z., Deng J.: Raft: Recurrent all-pairs field transforms for optical flow. In European Conference on Computer Vision (ECCV) (2020).
  290. MonoNeRF: Learning a generalizable dynamic radiance field from monocular videos. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  291. EPIC Fields: Marrying 3D Geometry and Video Understanding. In Advances in Neural Information Processing Systems (NeurIPS) (2023).
  292. State of the art on neural rendering. Computer Graphics Forum 39, 2 (2020), 701–727. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.14022, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14022, doi:https://doi.org/10.1111/cgf.14022.
  293. Scenerflow: Time-consistent reconstruction of general dynamic scenes. In International Conference on 3D Vision (3DV) (2024).
  294. State of the art in dense monocular non-rigid 3d reconstruction. Computer Graphics Forum 42, 2 (2023), 485–520. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.14774, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14774, doi:https://doi.org/10.1111/cgf.14774.
  295. Neural feature fusion fields: 3d distillation of selfsupervised 2d image representations. In International Conference on 3D Vision (3DV) (2022).
  296. Neuraldiff: Segmenting 3d objects that move in egocentric videos. In International Conference on 3D Vision (3DV) (2021), IEEE, pp. 910–919.
  297. Cla-nerf: Category-level articulated neural radiance field. In IEEE International Conference on Robotics and Automation (ICRA) (2022), IEEE, pp. 8454–8460.
  298. Neural shape deformation priors. In Advances in Neural Information Processing Systems (NeurIPS) (2022), vol. 35, pp. 17117–17132.
  299. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. In International Conference on Learning Representations (ICLR) (2024).
  300. Tucker R., Snavely N.: Single-view view synthesis with multiplane images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020).
  301. Fourier features let networks learn high frequency functions in low dimensional domains. In Advances in Neural Information Processing Systems (NeurIPS) (2020), vol. 33, pp. 7537–7547.
  302. Neural-gif: Neural generalized implicit functions for animating people in clothing. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 11708–11718.
  303. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In IEEE/CVF International Conference on Computer Vision (ICCV) (2021), IEEE, pp. 12939–12950.
  304. Advances in neural rendering. Computer Graphics Forum 41, 2 (2022), 703–735. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.14507, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14507, doi:https://doi.org/10.1111/cgf.14507.
  305. Learning parallel dense correspondence from spatio-temporal descriptors for efficient and robust 4d reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 6022–6031.
  306. Distilling neural fields for real-time articulated shape reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 4692–4701.
  307. Suds: Scalable urban dynamic scenes. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 12375–12385.
  308. Template-free articulated neural point clouds for reposable view synthesis. In Advances in Neural Information Processing Systems (NeurIPS) (2023). URL: https://openreview.net/forum?id=fyfmHi8ay3.
  309. Three-dimensional scene flow. In IEEE/CVF International Conference on Computer Vision (ICCV) (1999), vol. 2, IEEE, pp. 722–729.
  310. Revealing occlusions with 4d neural fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 3011–3021.
  311. Recovering accurate 3d human pose in the wild using imus and a moving camera. In European Conference on Computer Vision (ECCV) (September 2018).
  312. Rfnet-4d: Joint object reconstruction and flow estimation from 4d point clouds. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 36–52.
  313. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS) (2017).
  314. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612. doi:10.1109/TIP.2003.819861.
  315. Tracking everything everywhere all at once. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  316. Self-supervised neural articulated shape and appearance models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 15816–15826.
  317. HumanNeRF: Free-viewpoint rendering of moving people from monocular video. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2022), pp. 16210–16220.
  318. Root pose decomposition towards generic non-rigid 3d reconstruction with monocular videos. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  319. Neural trajectory fields for dynamic novel view synthesis, 2021. arXiv:2105.05994.
  320. Virdo: Visio-tactile implicit representations of deformable objects. In IEEE International Conference on Robotics and Automation (ICRA) (2022), IEEE, pp. 3583–3590.
  321. Neural residual radiance fields for streamably free-viewpoint videos. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 76–87.
  322. Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 3295–3306.
  323. Simnp: Learning self-similarity priors between neural points. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  324. Dove: Learning deformable 3d objects by watching videos. International Journal of Computer Vision (IJCV) 131, 10 (2023), 2623–2634.
  325. F2-nerf: Fast neural radiance field training with free camera trajectories. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  326. Neural rendering for stereo 3d reconstruction of deformable tissues in robotic surgery. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2022), Springer, pp. 431–441.
  327. Flownet3d++: Geometric losses for deep scene flow estimation. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2020), pp. 91–98.
  328. Magicpony: Learning articulated 3d animals in the wild. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 8792–8802.
  329. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In Advances in Neural Information Processing Systems (NeurIPS) (2021), Ranzato M., Beygelzimer A., Dauphin Y. N., Liang P., Vaughan J. W., (Eds.), pp. 27171–27183.
  330. Neural prior for trajectory estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 6532–6542.
  331. Wong Y.-S., Mitra N. J.: Factored neural representation for scene understanding. Computer Graphics Forum 42, 5 (2023), e14911. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.14911, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14911, doi:https://doi.org/10.1111/cgf.14911.
  332. Dynamical scene representation and control with keypoint-conditioned neural radiance field. In IEEE International Conference on Automation Science and Engineering (CASE) (2022), IEEE, pp. 1138–1143.
  333. Flow supervision for deformable nerf. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 21128–21137.
  334. Nex: Real-time view synthesis with neural basis expansion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
  335. On-set performance capture of multiple actors with a stereo camera. ACM Transactions on Graphics (ToG) 32, 6 (2013), 1–11.
  336. Mixed neural voxels for fast multi-view video synthesis. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 19706–19716.
  337. Ibrnet: Learning multi-view image-based rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 4690–4699.
  338. 4d gaussian splatting for real-time dynamic scene rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  339. Virdo++: Real-world, visuo-tactile dynamics and perception of deformable objects. In Conference on Robot Learning (CoRL) (2022).
  340. Fourier plenoctrees for dynamic radiance field rendering in real-time. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 13524–13534.
  341. Real-time shading-based refinement for consumer depth cameras. ACM Transactions on Graphics (ToG) 33, 6 (2014), 1–10.
  342. NEMTO: Neural Environment Matting for Novel View and Relighting Synthesis of Transparent Objects. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  343. D^ 2nerf: Self-supervised decoupling of dynamic and static objects from a monocular video. In Advances in Neural Information Processing Systems (NeurIPS) (2022), vol. 35, pp. 32653–32666.
  344. H-nerf: Neural radiance fields for rendering and temporal reconstruction of humans in motion. In Advances in Neural Information Processing Systems (NeurIPS) (2021), vol. 34, pp. 14955–14966.
  345. Xing W., Chen J.: Temporal-mpi: Enabling multi-plane images for dynamic scene modelling via temporal basis learning. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 323–338.
  346. Xu T., Harada T.: Deforming radiance fields with cages. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 159–175.
  347. Space-time neural irradiance fields for free-viewpoint video. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 9421–9431.
  348. Desrf: Deformable stylized radiance field. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 709–718.
  349. Identity-disentangled neural deformation model for dynamic meshes, 2021. arXiv:2109.15299.
  350. 4k4d: Real-time 4d view synthesis at 4k resolution. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  351. Physics-based human motion estimation and synthesis from videos. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
  352. Point-nerf: Point-based neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
  353. Hmdo: Markerless multi-view hand manipulation capture with deformable objects. Graphical Models 127 (2023), 101178.
  354. Rignet: Neural rigging for articulated characters. ACM Transactions on Graphics (ToG) 39 (2020).
  355. Volume rendering of neural implicit surfaces. In Advances in Neural Information Processing Systems (NeurIPS) (2021), Ranzato M., Beygelzimer A., Dauphin Y. N., Liang P., Vaughan J. W., (Eds.), pp. 4805–4815.
  356. Unsupervised discovery of object radiance fields. In International Conference on Learning Representations (ICLR) (2021).
  357. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
  358. You M., Hou J.: Decoupling dynamic monocular videos for dynamic view synthesis, 2023. arXiv:2304.01716.
  359. Hi-lassie: High-fidelity articulated shape and skeleton discovery from sparse image ensemble. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 4853–4862.
  360. Gencorres: Consistent shape matching via coupled implicit-explicit shape generative models. In International Conference on Learning Representations (ICLR) (May 2024).
  361. Dylin: Making light field networks dynamic. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 12397–12406.
  362. Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), pp. 5336–5345.
  363. Gaussian-slam: Photo-realistic dense slam with gaussian splatting, 2023. arXiv:2312.10070.
  364. Ds-slam: A semantic visual slam towards dynamic environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2018), IEEE, pp. 1168–1174.
  365. Nerf-ds: Neural radiance fields for dynamic specular objects. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  366. Od-nerf: Efficient training of on-the-fly dynamic neural radiance fields, 2023. arXiv:2305.14831.
  367. Plenoctrees for real-time rendering of neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5752–5761.
  368. A revisit of shape editing techniques: From the geometric to the neural viewpoint. Journal of Computer Science and Technology 36, 3 (jun 2021), 520–554. doi:10.1007/s11390-021-1414-9.
  369. Artic3d: Learning robust articulated 3d shapes from noisy web image collections. In Advances in Neural Information Processing Systems (NeurIPS) (2023).
  370. Lasr: Learning articulated shape reconstruction from a monocular video. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 15980–15989.
  371. Viser: Video-specific surface embeddings for articulated 3d shape reconstruction. Advances in Neural Information Processing Systems (NeurIPS) 34 (2021), 19326–19338.
  372. Nerf-editing: geometry editing of neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 18353–18364.
  373. Banmo: Building animatable 3d neural models from many casual videos. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
  374. Reconstructing animatable categories from videos. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 16995–17005.
  375. Neural lerplane representations for fast 4d reconstruction of deformable tissues. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2023).
  376. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. In International Conference on Learning Representations (ICLR) (2024).
  377. pixelNeRF: Neural radiance fields from one or few images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
  378. Ppr: Physically plausible reconstruction from monocular videos. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  379. Function4d: Real-time human volumetric capture from very sparse consumer rgbd sensors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5746–5756.
  380. Movingparts: Motion-based 3d part discovery in dynamic radiance field. In International Conference on Learning Representations (ICLR) (May 2024).
  381. Learning object-compositional neural radiance field for editable scene rendering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 13779–13788.
  382. Differentiable point-based radiance fields for efficient view synthesis. In ACM SIGGRAPH Asia (New York, NY, USA, 2022), SA ’22, Association for Computing Machinery. doi:10.1145/3550469.3555413.
  383. Interactionfusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions. ACM Transactions on Graphics (ToG) 38, 4 (2019), 1–11.
  384. Endosurf: Neural surface reconstruction of deformable tissues with stereo endoscope videos. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) (2023), Springer, pp. 13–23.
  385. Pseudo-Generalized Dynamic View Synthesis from a Video. In International Conference on Learning Representations (ICLR) (2024).
  386. Consistent depth of moving objects in video. ACM Transactions on Graphics (ToG) 40, 4 (2021), 1–12.
  387. Pr-rrn: pairwise-regularized residual-recursive networks for non-rigid structure-from-motion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 5600–5609.
  388. Human body shape completion with implicit shape and flow learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  389. Image forgery detection: a survey of recent deep-learning approaches. Multimedia Tools and Applications (2023).
  390. Pointodyssey: A large-scale synthetic dataset for long-term point tracking. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023), pp. 19855–19865.
  391. The unreasonable effectiveness of deep features as a perceptual metric. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018).
  392. A portable multiscopic camera for novel view and time synthesis in dynamic scenes. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2022), IEEE, pp. 2409–2416.
  393. Editablenerf: Editing topologically varying neural radiance fields by key points. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 8317–8327.
  394. Deformtoon3d: Deformable neural radiance fields for 3d toonification. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 9144–9154.
  395. Neuraldome: A neural modeling pipeline on multi-view human-object interactions. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2023), pp. 8834–8845.
  396. Dyn-e: Local appearance editing of dynamic neural radiance fields, 2023. arXiv:2307.12909.
  397. Surface splatting. In ACM SIGGRAPH (New York, NY, USA, 2001), SIGGRAPH ’01, Association for Computing Machinery, p. 371–378. URL: https://doi.org/10.1145/383259.383300, doi:10.1145/383259.383300.
  398. Idea-net: Dynamic 3d point cloud interpolation via deep embedding alignment. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), pp. 6338–6347.
  399. Adding conditional control to text-to-image diffusion models. In IEEE/CVF International Conference on Computer Vision (ICCV) (2023).
  400. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Transactions on Graphics (ToG) 40, 6 (2021), 1–18.
  401. State of the art on 3d reconstruction with rgb-d cameras. Computer Graphics Forum 37, 2 (2018), 625–652. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13386, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.13386, doi:https://doi.org/10.1111/cgf.13386.
  402. Zienkiewicz O. C., Taylor R. L.: The finite element method: solid mechanics, vol. 2. Butterworth-heinemann, 2000.
  403. Zhou Z., Tulsiani S.: Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023).
  404. Stereo magnification: Stereo magnification: Learning view synthesis using multiplane images. ACM Transactions on Graphics (ToG) 37, 4 (2018), 1–12.
  405. Zhang Y., Wu J.: Video extrapolation in space and time. In European Conference on Computer Vision (ECCV) (2022), pp. 313–333.
  406. Neuvv: Neural volumetric videos with immersive rendering and editing, 2022. arXiv:2202.06088.
  407. Neuralpci: Spatio-temporal neural field for 3d point cloud multi-frame non-linear interpolation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 909–918.
  408. Nemf: Inverse volume rendering with neural microflake field. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 22919–22929.
  409. Mhr-net: Multiple-hypothesis reconstruction of non-rigid shapes from 2d views. In European Conference on Computer Vision (ECCV) (2022), Springer, pp. 1–17.
  410. Pointavatar: Deformable point-based head avatars from videos. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), pp. 21057–21067.
Citations (14)

Summary

  • The paper introduces hybrid neural scene representations that combine voxel grids and feature planes with implicit neural models to enhance reconstruction speed and realism.
  • The paper demonstrates that integrating classical non-neural approaches with modern techniques simplifies scene editing while improving real-time performance.
  • The paper outlines future directions using generative models and data-driven priors to build generalizable systems for reconstructing diverse dynamic non-rigid scenes.

Recent Advancements in 3D Reconstruction of Dynamic Scenes

Introduction to 3D Reconstruction Challenges in Dynamic Environments

3D reconstruction of dynamic non-rigid scenes is a significantly challenging domain within computer vision and computer graphics. This complexity primarily arises due to the underconstrained nature of inferring 3D geometry and appearance from 2D observations in dynamically evolving scenes. Applications across movie production, augmented and virtual reality (AR/VR), and interaction design depend on robust 3D reconstruction techniques to interpret real-world dynamics accurately. The dynamic nature of these scenes introduces ambiguities in depth perception, occlusions, and deformation modeling, necessitating advanced computational approaches for accurate reconstruction.

Emergence of Neural Scene Representations

Neural scene representations have revolutionized 3D reconstruction methodologies by offering a flexible and unified framework for capturing complex scene dynamics. At the core of these advances is the concept of encoding scenes into implicit neural representations, typically utilizing Neural Radiance Fields (NeRFs) and their variants. These models have been extended to capture temporal variations, enabling dynamic scene reconstruction and novel view synthesis. Despite their potential, the computational overhead associated with training and inference in neural scene models posits significant challenges for real-time applications.

Hybrid Neural Scene Representations: Bridging Efficiency and Realism

Hybrid representations have emerged as a potent solution to the computational challenges posed by pure neural scene models. By integrating neural components with traditional data structures like voxel grids, feature planes, and point clouds, hybrid models achieve significant improvements in training and rendering speeds. These representations facilitate efficient querying and manipulation of scene features, enabling real-time applications in dynamic environments. Notably, such models allow decoupling scene representation from rendering, further enhancing editability and control over scene dynamics.

Navigating Challenges with Non-Neural Representations

Despite the advancements in neural and hybrid representations, non-neural approaches continue to play a crucial role in scenarios where data-driven methods face limitations. These methods, leveraging classical representations such as meshes, voxels, and surfels, offer direct control over geometric and appearance properties of scenes, simplifying edits and interactions. Real-time performance, a critical requirement in many applications, remains more achievable through these classical approaches, particularly when dealing with geometry reconstruction and tracking in dynamic scenes.

Towards Generalizable and Generative Modeling

A promising direction for future research is the development of generalizable and generative models for non-rigid scene reconstruction. Learning data-driven priors from large datasets has the potential to address the intrinsic challenges of scene dynamics, allowing models to generalize across different scenes and articulate objects. Generative models, leveraging techniques such as diffusion models, open new possibilities for scene synthesis and editing, enabling the generation of realistic and consistent dynamic scenes from sparse data or textual descriptions.

Open Challenges and Future Directions

The field of 3D reconstruction of dynamic scenes is ripe with open challenges and opportunities for innovation. Topics such as intrinsic decomposition and relighting, faster scene representations for real-time applications, reliable camera pose estimation in dynamic environments, and physics-based methods for enhanced realism are pivotal areas awaiting further exploration. Moreover, embracing compositionality and multi-object interaction, leveraging specialized sensors, and exploring the intersection with generative AI models present fertile grounds for advancing the state-of-the-art in dynamic scene reconstruction.

Conclusion

The advancements in 3D reconstruction of dynamic scenes highlight the field's rapid evolution, driven by the convergence of neural representations, hybrid models, and classical approaches. As the community continues to tackle the inherent challenges of dynamic environments, the development of efficient, generalizable, and generative models stands as a cornerstone for future breakthroughs. These advancements promise to revolutionize applications across entertainment, AR/VR, and interactive systems, offering unprecedented realism and interactivity in digital content creation and consumption.