Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video (2404.09833v1)
Abstract: Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF) module that effectively captures the geometry and visual appearance of the scene; (ii) a mesh module that distills the knowledge from NeRF for faster rendering; and (iii) a physics module that models the interactions and physical dynamics among the objects. By following the carefully designed pipeline, one can construct an interactable and actionable digital replica of the real world. We benchmark our system on both indoor and large-scale outdoor scenes. We show that we can not only produce highly-realistic renderings in real-time, but also build interactive games on top.
- Cannon.js. https://schteppe.github.io/cannon.js/.
- GLSL. https://www.khronos.org/opengl/wiki/OpenGL_Shading_Language.
- Sketchbook. https://github.com/swift502/Sketchbook.
- WebGL. https://www.khronos.org/webgl/.
- Fetch Mobile Manipulator. https://fetchrobotics.borealtech.com/robotics-platforms/fetch-mobile-manipulator/?lang=en.
- ngp-pl. https://github.com/kwea123/ngp_pl.
- PyMesh. https://github.com/PyMesh/PyMesh.
- Stretch® Research Edition. https://hello-robot.com/product.
- urdf-loaders. https://github.com/gkjohnson/urdf-loaders.
- V-HACD. https://github.com/kmammou/v-hacd.
- Xatlas. https://github.com/mworchel/xatlas-python.
- RoboTHOR: An Open Simulation-to-Real Embodied AI Platform. CVPR, 2020.
- ManipulaTHOR: A Framework for Visual Object Manipulation. In CVPR, 2021.
- Vista 2.0: An open, data-driven simulator for multimodal sensing and policy learning for autonomous vehicles. In ICRA, 2022.
- Text2live: Text-driven layered image and video editing. In ECCV, 2022.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In CVPR, 2022.
- Tensorf: Tensorial radiance fields. ECCV, 2022.
- View interpolation for image synthesis. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques, 1993.
- Geosim: Realistic video simulation via geometry-aware composition for self-driving. In CVPR, 2021.
- Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. CVPR, 2023.
- Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018.
- Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2021.
- Carla: An open urban driving simulator, 2017.
- Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In ICCV, 2019.
- D. Ponsa E. Rublee E. Riba, D. Mishkin and G. Bradski. Kornia: an open source differentiable computer vision library for pytorch. In Winter Conference on Applications of Computer Vision, 2020.
- Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10786–10796, 2021.
- Epic Games. Unreal engine.
- Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
- Rockstar Games. Grand theft auto v. 2014.
- John K Haas. A history of the unity game engine. 2014.
- Plenoptic modeling and rendering from image sequences taken by a hand-held camera. In DAGM-Symposium, 1999.
- Worldsheet: Wrapping the world in a 3d sheet for view synthesis from a single image. In ICCV, 2021.
- 3d common corruptions and data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18963–18974, 2022.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
- Parameter identification of robot dynamics. In IEEE conference on decision and control, 1985.
- Learning to simulate dynamic environments with gamegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1231–1240, 2020.
- Drivegan: Towards a controllable high-quality neural simulation. In CVPR, 2021.
- Ai2-thor: An interactive 3d environment for visual ai, 2022.
- Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics, 39(6), 2020.
- Shape-aware text-driven layered video editing. arXiv, 2023.
- Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996.
- Climatenerf: Physically-based neural rendering for extreme climate synthesis. arXiv, 2022.
- Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE TPAMI, 2022.
- Barf: Bundle-adjusting neural radiance fields. In ICCV, 2021.
- Neurmips: Neural mixture of planar experts for view synthesis. In CVPR, 2022.
- Urbanir: Large-scale urban scene inverse rendering from a single video, 2023.
- Real-time neural rasterization for large scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8416–8427, 2023.
- Urban radiance field representation with deformable neural mesh primitives, 2023.
- Lidarsim: Realistic lidar simulation by leveraging the real world. In CVPR, 2020.
- Playable video generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10061–10070, 2021.
- Playable environments: Video manipulation in space and time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3584–3593, 2022.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Thomas Müller. tiny-cuda-nn, 2021.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG, 2022.
- Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010.
- Giraffe: Representing scenes as compositional generative neural feature fields. In CVPR, 2021.
- Nerfies: Deformable neural radiance fields. In ICCV, 2021a.
- Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv, 2021b.
- Augmented reality and photogrammetry: A synergy to visualize physical and virtual city environments. ISPRS Journal of Photogrammetry and Remote Sensing, 65(1):134–142, 2010.
- Photogrammetric modeling and image-based rendering for rapid virtual environment creation. In Proceedings of ASC, 2004.
- Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE TPAMI, 2020.
- Merf: Memory-efficient radiance fields for real-time view synthesis in unbounded scenes. arXiv preprint arXiv:2302.12249, 2023.
- Urban radiance fields. In CVPR, 2022.
- Structure-from-motion revisited. In CVPR, 2016.
- Make-a-video: Text-to-video generation without text-video data. arXiv, 2022.
- Text-to-4d dynamic scene generation. arXiv, 2023.
- Differentiable hybrid traffic simulation. TOG, 2022.
- Pushing the boundaries of view extrapolation with multiplane images. In CVPR, 2019.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
- Loftr: Detector-free local feature matching with transformers, 2021.
- Stereo matching with transparency and matting. In Sixth International Conference on Computer Vision, 1998.
- Block-nerf: Scalable large scene neural view synthesis. In CVPR, 2022.
- Nerfstudio: A modular framework for neural radiance field development. arXiv, 2023.
- Delicate textured mesh recovery from nerf via adaptive surface refinement. arXiv preprint arXiv:2303.02091, 2023.
- A study on close-range photogrammetry in image based modelling and rendering (imbr) approaches and post-processing analysis. Journal of Engineering Science and Technology, 14(4):1912–1923, 2019.
- Mujoco: A physics engine for model-based control. In IROS, 2012.
- Layer-structured 3d scene inference via view synthesis. In ECCV, 2018.
- Ref-nerf: Structured view-dependent appearance for neural radiance fields. In CVPR, 2022.
- Nerf–: Neural radiance fields without known camera parameters. arXiv, 2021.
- Torcs, the open racing car simulator. Software available at http://torcs.sourceforge.net, 2000.
- Ultralidar: Learning compact representations for lidar completion and generation. CVPR, 2023.
- VR-NeRF: High-fidelity virtualized walkable spaces. In SIGGRAPH Asia Conference Proceedings, 2023.
- Surfelgan: Synthesizing realistic sensor data for autonomous driving. In CVPR, 2020.
- Unisim: A neural closed-loop sensor simulator. CVPR, 2023.
- Bakedsdf: Meshing neural sdfs for real-time view synthesis. Siggraph, 2023.
- Plenoctrees for real-time rendering of neural radiance fields, 2021.
- Sdfstudio: A unified framework for surface reconstruction, 2022a.
- Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. in arXiv, 2022b.
- Learning physically simulated tennis skills from broadcast videos. ACM Trans. Graph.
- Vid2player: Controllable video sprites that behave and appear like professional tennis players. ACM Transactions on Graphics (TOG), 40(3):1–16, 2021a.
- Nerf++: Analyzing and improving neural radiance fields. arXiv, 2020.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM TOG, 2021b.
- In-place scene labelling and understanding with implicit scene representation, 2021.
- Stereo magnification: Learning view synthesis using multiplane images. arXiv, 2018.
- View synthesis with sculpted neural points. in arXiv, 2022.
- Learning to generate realistic lidar point clouds. In ECCV, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.