Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 154 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 110 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video (2404.09833v1)

Published 15 Apr 2024 in cs.CV and cs.AI

Abstract: Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF) module that effectively captures the geometry and visual appearance of the scene; (ii) a mesh module that distills the knowledge from NeRF for faster rendering; and (iii) a physics module that models the interactions and physical dynamics among the objects. By following the carefully designed pipeline, one can construct an interactable and actionable digital replica of the real world. We benchmark our system on both indoor and large-scale outdoor scenes. We show that we can not only produce highly-realistic renderings in real-time, but also build interactive games on top.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (97)
  1. Cannon.js. https://schteppe.github.io/cannon.js/.
  2. GLSL. https://www.khronos.org/opengl/wiki/OpenGL_Shading_Language.
  3. Sketchbook. https://github.com/swift502/Sketchbook.
  4. WebGL. https://www.khronos.org/webgl/.
  5. Fetch Mobile Manipulator. https://fetchrobotics.borealtech.com/robotics-platforms/fetch-mobile-manipulator/?lang=en.
  6. ngp-pl. https://github.com/kwea123/ngp_pl.
  7. PyMesh. https://github.com/PyMesh/PyMesh.
  8. Stretch® Research Edition. https://hello-robot.com/product.
  9. urdf-loaders. https://github.com/gkjohnson/urdf-loaders.
  10. V-HACD. https://github.com/kmammou/v-hacd.
  11. Xatlas. https://github.com/mworchel/xatlas-python.
  12. RoboTHOR: An Open Simulation-to-Real Embodied AI Platform. CVPR, 2020.
  13. ManipulaTHOR: A Framework for Visual Object Manipulation. In CVPR, 2021.
  14. Vista 2.0: An open, data-driven simulator for multimodal sensing and policy learning for autonomous vehicles. In ICRA, 2022.
  15. Text2live: Text-driven layered image and video editing. In ECCV, 2022.
  16. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In CVPR, 2022.
  17. Tensorf: Tensorial radiance fields. ECCV, 2022.
  18. View interpolation for image synthesis. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques, 1993.
  19. Geosim: Realistic video simulation via geometry-aware composition for self-driving. In CVPR, 2021.
  20. Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. CVPR, 2023.
  21. Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam, 2018.
  22. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2021.
  23. Carla: An open urban driving simulator, 2017.
  24. Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In ICCV, 2019.
  25. D. Ponsa E. Rublee E. Riba, D. Mishkin and G. Bradski. Kornia: an open source differentiable computer vision library for pytorch. In Winter Conference on Applications of Computer Vision, 2020.
  26. Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10786–10796, 2021.
  27. Epic Games. Unreal engine.
  28. Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
  29. Rockstar Games. Grand theft auto v. 2014.
  30. John K Haas. A history of the unity game engine. 2014.
  31. Plenoptic modeling and rendering from image sequences taken by a hand-held camera. In DAGM-Symposium, 1999.
  32. Worldsheet: Wrapping the world in a 3d sheet for view synthesis from a single image. In ICCV, 2021.
  33. 3d common corruptions and data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18963–18974, 2022.
  34. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
  35. Parameter identification of robot dynamics. In IEEE conference on decision and control, 1985.
  36. Learning to simulate dynamic environments with gamegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1231–1240, 2020.
  37. Drivegan: Towards a controllable high-quality neural simulation. In CVPR, 2021.
  38. Ai2-thor: An interactive 3d environment for visual ai, 2022.
  39. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics, 39(6), 2020.
  40. Shape-aware text-driven layered video editing. arXiv, 2023.
  41. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996.
  42. Climatenerf: Physically-based neural rendering for extreme climate synthesis. arXiv, 2022.
  43. Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE TPAMI, 2022.
  44. Barf: Bundle-adjusting neural radiance fields. In ICCV, 2021.
  45. Neurmips: Neural mixture of planar experts for view synthesis. In CVPR, 2022.
  46. Urbanir: Large-scale urban scene inverse rendering from a single video, 2023.
  47. Real-time neural rasterization for large scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8416–8427, 2023.
  48. Urban radiance field representation with deformable neural mesh primitives, 2023.
  49. Lidarsim: Realistic lidar simulation by leveraging the real world. In CVPR, 2020.
  50. Playable video generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10061–10070, 2021.
  51. Playable environments: Video manipulation in space and time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3584–3593, 2022.
  52. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  53. Thomas Müller. tiny-cuda-nn, 2021.
  54. Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG, 2022.
  55. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010.
  56. Giraffe: Representing scenes as compositional generative neural feature fields. In CVPR, 2021.
  57. Nerfies: Deformable neural radiance fields. In ICCV, 2021a.
  58. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv, 2021b.
  59. Augmented reality and photogrammetry: A synergy to visualize physical and virtual city environments. ISPRS Journal of Photogrammetry and Remote Sensing, 65(1):134–142, 2010.
  60. Photogrammetric modeling and image-based rendering for rapid virtual environment creation. In Proceedings of ASC, 2004.
  61. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE TPAMI, 2020.
  62. Merf: Memory-efficient radiance fields for real-time view synthesis in unbounded scenes. arXiv preprint arXiv:2302.12249, 2023.
  63. Urban radiance fields. In CVPR, 2022.
  64. Structure-from-motion revisited. In CVPR, 2016.
  65. Make-a-video: Text-to-video generation without text-video data. arXiv, 2022.
  66. Text-to-4d dynamic scene generation. arXiv, 2023.
  67. Differentiable hybrid traffic simulation. TOG, 2022.
  68. Pushing the boundaries of view extrapolation with multiplane images. In CVPR, 2019.
  69. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
  70. Loftr: Detector-free local feature matching with transformers, 2021.
  71. Stereo matching with transparency and matting. In Sixth International Conference on Computer Vision, 1998.
  72. Block-nerf: Scalable large scene neural view synthesis. In CVPR, 2022.
  73. Nerfstudio: A modular framework for neural radiance field development. arXiv, 2023.
  74. Delicate textured mesh recovery from nerf via adaptive surface refinement. arXiv preprint arXiv:2303.02091, 2023.
  75. A study on close-range photogrammetry in image based modelling and rendering (imbr) approaches and post-processing analysis. Journal of Engineering Science and Technology, 14(4):1912–1923, 2019.
  76. Mujoco: A physics engine for model-based control. In IROS, 2012.
  77. Layer-structured 3d scene inference via view synthesis. In ECCV, 2018.
  78. Ref-nerf: Structured view-dependent appearance for neural radiance fields. In CVPR, 2022.
  79. Nerf–: Neural radiance fields without known camera parameters. arXiv, 2021.
  80. Torcs, the open racing car simulator. Software available at http://torcs.sourceforge.net, 2000.
  81. Ultralidar: Learning compact representations for lidar completion and generation. CVPR, 2023.
  82. VR-NeRF: High-fidelity virtualized walkable spaces. In SIGGRAPH Asia Conference Proceedings, 2023.
  83. Surfelgan: Synthesizing realistic sensor data for autonomous driving. In CVPR, 2020.
  84. Unisim: A neural closed-loop sensor simulator. CVPR, 2023.
  85. Bakedsdf: Meshing neural sdfs for real-time view synthesis. Siggraph, 2023.
  86. Plenoctrees for real-time rendering of neural radiance fields, 2021.
  87. Sdfstudio: A unified framework for surface reconstruction, 2022a.
  88. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. in arXiv, 2022b.
  89. Learning physically simulated tennis skills from broadcast videos. ACM Trans. Graph.
  90. Vid2player: Controllable video sprites that behave and appear like professional tennis players. ACM Transactions on Graphics (TOG), 40(3):1–16, 2021a.
  91. Nerf++: Analyzing and improving neural radiance fields. arXiv, 2020.
  92. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  93. Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM TOG, 2021b.
  94. In-place scene labelling and understanding with implicit scene representation, 2021.
  95. Stereo magnification: Learning view synthesis using multiplane images. arXiv, 2018.
  96. View synthesis with sculpted neural points. in arXiv, 2022.
  97. Learning to generate realistic lidar point clouds. In ECCV, 2022.
Citations (5)

Summary

  • The paper introduces Video2Game, which automatically converts single videos into interactive, photo-realistic 3D environments by integrating neural radiance fields, mesh conversion, and a physics module.
  • The methodology leverages advanced NeRF modeling enhanced with depth cues, semantic understanding, and normal prediction to capture detailed geometry and high-fidelity visuals.
  • The framework supports real-time, browser-compatible interactions, drastically reducing development time and cost for game prototyping and virtual simulations.

Transforming Videos into Interactive Game Environments with Video2Game

Introduction

The development of virtual environments for use in video games, VR applications, and simulators is a complex and often costly process. It requires a multidisciplinary effort involving artists, programmers, and engineers to create realistic, interactive 3D spaces. The paper introduces Video2Game, a novel framework that leverages videos of real-world scenes to automatically generate interactive and realistic game environments. The core of Video2Game combines neural radiance fields (NeRF) for high-fidelity visual modeling, with a physics module for realistic interactions, and meshes for efficient rendering. These components are integrated into a WebGL-based game engine allowing for real-time user interaction within browser-compatible virtual worlds.

Core Components

  • Neural Radiance Fields (NeRF): Video2Game utilizes an advanced NeRF model that captures detailed geometry and visual appearance of the scene. These models have shown promise in rendering photo-realistic static scenes from sparse image data. The paper extends these capabilities to capture large-scale scenes, augmenting traditional NeRF approaches with depth cues, semantic understanding, and normal vector prediction to improve the realism and detail of generated environments.
  • Mesh Module: NeRF's high-quality rendering comes at a computational cost, inhibiting real-time interaction. The authors address this by converting the NeRF-generated representations into meshes with neural textures. This allows for significantly faster rendering that retains visual fidelity and supports integration with gaming engines.
  • Physics Module: To enrich the interactive experience beyond mere visual exploration, Video2Game incorporates a physics module. This module enables realistic simulations of physical dynamics and interactions within the virtual environment, such as collisions and object manipulation.

Benchmarking and Results

The system was tested on various scene complexities, including both indoor and outdoor environments. The results indicate that Video2Game can produce highly realistic renderings at interactive frame rates. Moreover, the framework demonstrated its capability to allow real-time interactions within these virtual spaces, fulfilling the promise of transforming simple video footage into rich, explorable, and interactive domains.

Implications and Future Directions

The introduction of Video2Game signifies an important advancement in the automation of virtual environment creation. For researchers, game developers, and content creators, the implications are far-reaching. This approach can drastically reduce the time and resources required to produce interactive 3D environments. It opens new possibilities for the rapid prototyping of game levels, simulation scenarios, and virtual experiences directly from real-world footage.

Looking forward, the integration of more complex physical interactions and the enhancement of semantic scene understanding represent promising areas for further development. Additionally, exploring the potential for dynamic scene changes and interactions in real-time within these virtual environments could further bridge the gap between virtual and physical worlds.

Conclusion

Video2Game presents an innovative approach to 3D content creation, leveraging the vast availability of video data to automatically generate interactive, realistic virtual environments. This work marks a significant step forward in the field of generative AI and virtual environment creation, offering a glimpse into the future of interactive media where the lines between the real and the virtual continue to blur. As the technology matures, it has the potential to democratize content creation, making sophisticated virtual experiences accessible to a wider range of creators and audiences alike.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 16 tweets and received 877 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com