Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 71 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

LLMR: Real-time Prompting of Interactive Worlds using Large Language Models (2309.12276v3)

Published 21 Sep 2023 in cs.HC, cs.AI, cs.CL, and cs.ET

Abstract: We present LLM for Mixed Reality (LLMR), a framework for the real-time creation and modification of interactive Mixed Reality experiences using LLMs. LLMR leverages novel strategies to tackle difficult cases where ideal training data is scarce, or where the design goal requires the synthesis of internal dynamics, intuitive analysis, or advanced interactivity. Our framework relies on text interaction and the Unity game engine. By incorporating techniques for scene understanding, task planning, self-debugging, and memory management, LLMR outperforms the standard GPT-4 by 4x in average error rate. We demonstrate LLMR's cross-platform interoperability with several example worlds, and evaluate it on a variety of creation and modification tasks to show that it can produce and edit diverse objects, tools, and scenes. Finally, we conducted a usability study (N=11) with a diverse set that revealed participants had positive experiences with the system and would use it again.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021).
  2. Keep me updated! memory management in long-term conversations. arXiv preprint arXiv:2210.08750 (2022).
  3. Erik Bethke. 2003. Game development and production. Wordware Publishing, Inc.
  4. Ang Cao and Justin Johnson. 2023. Hexplane: A fast representation for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 130–141.
  5. Semanticadapt: Optimization-based adaptation of mixed reality layouts leveraging virtual-physical semantic connections. In The 34th Annual ACM Symposium on User Interface Software and Technology. 282–297.
  6. Pangu-coder: Program synthesis with function-level language modeling. arXiv preprint arXiv:2207.11280 (2022).
  7. AuthorIVE: Authoring Interactions for Virtual Environments through Disambiguating Demonstrations. (2020).
  8. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023).
  9. A Survey on Remote Assistance and Training in Mixed Reality Environments. IEEE Transactions on Visualization and Computer Graphics 29, 5 (2023), 2291–2303.
  10. Jonas Freiknecht and Wolfgang Effelsberg. 2017. A survey on the procedural generation of virtual worlds. Multimodal Technologies and Interaction 1, 4 (2017), 27.
  11. Nerf: Neural radiance field in 3d vision, a comprehensive review. arXiv preprint arXiv:2210.00379 (2022).
  12. Google. 2016. Tilt Brush. Tilt Brush (2016). https://www.tiltbrush.com/
  13. Interactive example-based terrain authoring with conditional generative adversarial networks. ACM Trans. Graph. 36, 6 (2017), 228–1.
  14. A real-world webagent with planning, long context understanding, and program synthesis. arXiv preprint arXiv:2307.12856 (2023).
  15. Instruct-nerf2nerf: Editing 3d scenes with instructions. arXiv preprint arXiv:2303.12789 (2023).
  16. When XR and AI Meet-A Scoping Review on Extended Reality and Artificial Intelligence. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–45.
  17. Text2room: Extracting textured 3d meshes from 2d text-to-image models. arXiv preprint arXiv:2303.11989 (2023).
  18. 3D-LLM: Injecting the 3D World into Large Language Models. arXiv preprint arXiv:2307.12981 (2023).
  19. Memory Sandbox: Transparent and Interactive Memory Management for Conversational Agents. arXiv preprint arXiv:2308.01542 (2023).
  20. Jigsaw: Large language models meet program synthesis. In Proceedings of the 44th International Conference on Software Engineering. 1219–1231.
  21. Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463 (2023).
  22. Holodiffusion: Training a 3D diffusion model using 2D images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18423–18433.
  23. Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474 (2017).
  24. 3ddesigner: Towards photorealistic 3d object generation and editing with text-guided diffusion models. arXiv preprint arXiv:2211.14108 (2022).
  25. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 300–309.
  26. Context-aware online adaptation of mixed reality interfaces. In Proceedings of the 32nd annual ACM Symposium on User Interface Software and Technology. 147–160.
  27. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv preprint arXiv:2305.13860 (2023).
  28. RoCo: Dialectic Multi-Robot Collaboration with Large Language Models. arXiv preprint arXiv:2307.04738 (2023).
  29. Microsoft. 2017. Mixed Reality Toolkit. Github (2017). https://github.com/microsoft/MixedRealityToolkit-Unity
  30. Microsoft. 2023. Mixed Reality Mobile Remoting. Github (2023). https://github.com/microsoft/Mixed-Reality-Remoting-Unity
  31. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022).
  32. OpenAI. 2023. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023). https://api.semanticscholar.org/CorpusID:257532815
  33. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
  34. Learning Transferable Visual Models From Natural Language Supervision. arXiv:2103.00020 [cs.CV]
  35. SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning. arXiv preprint arXiv:2307.06135 (2023).
  36. Texture: Text-guided texturing of 3d shapes. arXiv preprint arXiv:2302.01721 (2023).
  37. Steps towards prompt-based creation of virtual worlds. arXiv preprint arXiv:2211.05875 (2022).
  38. Surreal VR Pong: LLM approach to Game Design. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022). https://www.microsoft.com/en-us/research/publication/surreal-vr-pong-llm-approach-to-game-design/
  39. Text-to-4d dynamic scene generation. arXiv preprint arXiv:2301.11280 (2023).
  40. ObjectStitch: Object Compositing With Diffusion Model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18310–18319.
  41. Oasis: Procedurally generated social virtual spaces from 3d scanned real spaces. IEEE transactions on visualization and computer graphics 24, 12 (2017), 3174–3187.
  42. Procedurally generated virtual reality from 3D reconstructed physical space. In Proceedings of the 22nd ACM Conference on Virtual Reality Software and Technology. 191–200.
  43. Deepspace: Mood-based image texture generation for virtual reality from music. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 41–50.
  44. Unity Technologies. 2019. Unity Render Streaming. Unity (2019). https://docs.unity3d.com/Packages/[email protected]/manual/index.html
  45. OpenAI. 2022. DALL·E 2. OpenAI (2022). https://openai.com/dall-e-2
  46. Sketchfab, Inc. 2023. Sketchfab. Sketchfab (2023). https://sketchfab.com/
  47. Trivial Interactive. 2019. Roslyn C##\## - runtime C##\## compiler. Unity Asset Store (2019). https://forum.unity.com/threads/released-roslyn-c-runtime-c-compiler.651505/
  48. Unity Technologies. 2005. Unity Game Engine. Unity (2005). https://unity.com/
  49. Loki: Facilitating remote instruction of physical tasks using bi-directional mixed-reality telepresence. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology. 161–174.
  50. Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. In The Third Wordplay: When Language Meets Games Workshop. https://openreview.net/forum?id=I9glM3N6iAa
  51. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 (2023).
  52. Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models. arXiv preprint arXiv:2308.15022 (2023).
  53. Augmenting Language Models with Long-Term Memory. arXiv preprint arXiv:2306.07174 (2023).
  54. Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes. arXiv preprint arXiv:2308.08769 (2023).
  55. Embodied task planning with large language models. arXiv preprint arXiv:2307.01848 (2023).
  56. XAIR: A Framework of Explainable AI in Augmented Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–30.
  57. Adding Conditional Control to Text-to-Image Diffusion Models. arXiv:2302.05543 [cs.CV]
  58. MotionGPT: Finetuned LLMs are General-Purpose Motion Generators. arXiv preprint arXiv:2306.10900 (2023).
  59. Do RNN and LSTM have long memory?. In International Conference on Machine Learning. PMLR, 11365–11375.
  60. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
  61. MemoryBank: Enhancing Large Language Models with Long-Term Memory. arXiv preprint arXiv:2305.10250 (2023).
  62. Pointclip v2: Prompting clip and gpt for powerful 3d open-world learning. In ICCV, Vol. 2. 5.
Citations (19)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube