Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

How People Prompt to Create Interactive VR Scenes (2402.10525v2)

Published 16 Feb 2024 in cs.HC

Abstract: Generative AI tools can provide people with the ability to create virtual environments and scenes with natural language prompts. Yet, how people will formulate such prompts is unclear -- particularly when they inhabit the environment that they are designing. For instance, it is likely that a person might say, "Put a chair here", while pointing at a location. If such linguistic features are common to people's prompts, we need to tune models to accommodate them. In this work, we present a wizard-of-oz elicitation study with 22 participants, where we studied people's implicit expectations when verbally prompting such programming agents to create interactive VR scenes. Our findings show that people prompt with several implicit expectations: (1) that agents have an embodied knowledge of the environment; (2) that agents understand embodied prompts by users; (3) that the agents can recall previous states of the scene and the conversation, and that (4) agents have a commonsense understanding of objects in the scene. Further, we found that participants prompt differently when they are prompting in situ (i.e. within the VR environment) versus ex situ (i.e. viewing the VR environment from the outside). To explore how our could be applied, we designed and built Oastaad, a conversational programming agent that allows non-programmers to design interactive VR experiences that they inhabit. Based on these explorations, we outline new opportunities and challenges for conversational programming agents that create VR environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. SymbiosisSketch: Combining 2D & 3D Sketching for Designing Detailed 3D Objects in Situ. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, Montreal QC Canada, 1–15. https://doi.org/10.1145/3173574.3173759
  2. MagicalHands: Mid-Air Hand Gestures for Animating in VR. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology. ACM, New Orleans LA USA, 463–477. https://doi.org/10.1145/3332165.3347942
  3. Proxemic interaction: designing for a proximity and orientation-aware environment. In ACM International Conference on Interactive Tabletops and Surfaces. ACM, Saarbrücken Germany, 121–130. https://doi.org/10.1145/1936652.1936676
  4. Richard A. Bolt. 1980. “Put-that-there”: Voice and gesture at the graphics interface. ACM SIGGRAPH Computer Graphics 14, 3 (July 1980), 262–270. https://doi.org/10.1145/965105.807503
  5. Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco CA USA, 1–14. https://doi.org/10.1145/3586183.3606725
  6. Language Models are Few-Shot Learners. (2020). https://doi.org/10.48550/ARXIV.2005.14165 Publisher: arXiv Version Number: 4.
  7. Julia Cambre and Chinmay Kulkarni. 2020. Methods and Tools for Prototyping Voice Interfaces. In Proceedings of the 2nd Conference on Conversational User Interfaces. ACM, Bilbao Spain, 1–4. https://doi.org/10.1145/3405755.3406148
  8. User Elicitation on Single-hand Microgestures. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, San Jose California USA, 3403–3414. https://doi.org/10.1145/2858036.2858589
  9. Evaluating Large Language Models Trained on Code. (2021). https://doi.org/10.48550/ARXIV.2107.03374 Publisher: arXiv Version Number: 2.
  10. TaleBrush: Sketching Stories with Generative Pretrained Language Models. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–19. https://doi.org/10.1145/3491102.3501819
  11. Herbert H. Clark. 1996. Using language. Cambridge University Press, New York, NY, US. https://doi.org/10.2277/0521561582
  12. Eliciting Model Steering Interactions from Users via Data and Visual Design Probes. http://arxiv.org/abs/2310.09314 arXiv:2310.09314 [cs].
  13. Wizard of Oz studies: why and how. In Proceedings of the 1st international conference on Intelligent user interfaces - IUI ’93. ACM Press, Orlando, Florida, United States, 193–200. https://doi.org/10.1145/169891.169968
  14. How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. (2022). https://doi.org/10.48550/ARXIV.2209.01390 Publisher: arXiv Version Number: 1.
  15. LLMR: Real-time Prompting of Interactive Worlds using Large Language Models. (2023). https://doi.org/10.48550/ARXIV.2309.12276 Publisher: arXiv Version Number: 2.
  16. A Wizard of Oz Study Simulating API Usage Dialogues With a Virtual Assistant. IEEE Transactions on Software Engineering 48, 6 (June 2022), 1883–1904. https://doi.org/10.1109/TSE.2020.3040935
  17. Ivy: Exploring Spatially Situated Visual Programming for Authoring and Understanding Intelligent Environments. In Proceedings of the 43rd Graphics Interface Conference (GI ’17). Canadian Human-Computer Communications Society, Waterloo, CAN, 156–162. event-place: Edmonton, Alberta, Canada.
  18. Art and the science of generative AI. Science 380, 6650 (2023), 1110–1111. Publisher: American Association for the Advancement of Science.
  19. Canvil: Designerly Adaptation for LLM-Powered User Experiences. (2024). https://doi.org/10.48550/ARXIV.2401.09051 Publisher: arXiv Version Number: 1.
  20. Supporting awareness and interaction through collaborative virtual interfaces. In Proceedings of the 12th annual ACM symposium on User interface software and technology. 27–36.
  21. PAL: Program-aided Language Models. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 10764–10799. https://proceedings.mlr.press/v202/gao23f.html
  22. Mental models and expectation violations in conversational AI interactions. Decision Support Systems 144 (May 2021), 113515. https://doi.org/10.1016/j.dss.2021.113515
  23. Fragmented interaction: establishing mutual orientation in virtual environments. In Proceedings of the 1998 ACM conference on Computer supported cooperative work. 217–226.
  24. Object-focused interaction in collaborative virtual environments. ACM Transactions on Computer-Human Interaction (TOCHI) 7, 4 (2000), 477–509. Publisher: ACM New York, NY, USA.
  25. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 9118–9147. https://proceedings.mlr.press/v162/huang22a.html
  26. Inner Monologue: Embodied Reasoning through Planning with Language Models. In Proceedings of The 6th Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 205), Karen Liu, Dana Kulic, and Jeff Ichnowski (Eds.). PMLR, 1769–1782. https://proceedings.mlr.press/v205/huang23c.html
  27. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. In Proceedings of The 6th Conference on Robot Learning (Proceedings of Machine Learning Research, Vol. 205), Karen Liu, Dana Kulic, and Jeff Ichnowski (Eds.). PMLR, 287–318. https://proceedings.mlr.press/v205/ichter23a.html
  28. Mina C Johnson-Glenberg. 2018. Immersive VR and education: Embodied design principles that include gesture and hand controls. Frontiers in Robotics and AI 5 (2018), 81. Publisher: Frontiers.
  29. Cells, Generators, and Lenses: Design Framework for Object-Oriented Interaction with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco CA USA, 1–18. https://doi.org/10.1145/3586183.3606833
  30. Six Learning Barriers in End-User Programming Systems. In 2004 IEEE Symposium on Visual Languages - Human Centric Computing. IEEE, Rome, Italy, 199–206. https://doi.org/10.1109/VLHCC.2004.47
  31. Will You Accept an Imperfect AI?: Exploring Designs for Adjusting End-user Expectations of AI Systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow Scotland Uk, 1–14. https://doi.org/10.1145/3290605.3300641
  32. Tourgether360: Collaborative Exploration of 360 Videos using Pseudo-Spatial Navigation. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–27. Publisher: ACM New York, NY, USA.
  33. Designing Multi-Modal Conversational Agents for the Kitchen with Older Adults: A Participatory Design Study. International Journal of Social Robotics 15, 9-10 (Oct. 2023), 1507–1523. https://doi.org/10.1007/s12369-023-01055-4
  34. Pronto: Rapid Augmented Reality Video Prototyping Using Sketches and Enaction. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–13. https://doi.org/10.1145/3313831.3376160
  35. Code as Policies: Language Model Programs for Embodied Control. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, London, United Kingdom, 9493–9500. https://doi.org/10.1109/ICRA48891.2023.10160591
  36. Language-driven synthesis of 3D scenes from scene databases. ACM Transactions on Graphics 37, 6 (Dec. 2018), 1–16. https://doi.org/10.1145/3272127.3275035
  37. The proximity toolkit: prototyping proxemic interactions in ubiquitous computing ecologies. In Proceedings of the 24th annual ACM symposium on User interface software and technology (UIST ’11). Association for Computing Machinery, New York, NY, USA, 315–326. https://doi.org/10.1145/2047196.2047238
  38. Cross-device interaction via micro-mobility and f-formations. In Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, Cambridge Massachusetts USA, 13–22. https://doi.org/10.1145/2380116.2380121
  39. Prototyping an intelligent agent through Wizard of Oz. In Proceedings of the SIGCHI conference on Human factors in computing systems - CHI ’93. ACM Press, Amsterdam, The Netherlands, 277–284. https://doi.org/10.1145/169059.169215
  40. Meredith Ringel Morris. 2012. Web on the wall: insights from a multimodal interaction elicitation study. In Proceedings of the 2012 ACM international conference on Interactive tabletops and surfaces. ACM, Cambridge Massachusetts USA, 95–104. https://doi.org/10.1145/2396636.2396651
  41. Scale Impacts Elicited Gestures for Manipulating Holograms: Implications for AR Gesture Design. In Proceedings of the 2018 Designing Interactive Systems Conference. ACM, Hong Kong China, 227–240. https://doi.org/10.1145/3196709.3196719
  42. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. (2023). https://doi.org/10.48550/ARXIV.2307.01952 Publisher: arXiv Version Number: 1.
  43. Off to see the Wizard: using a ”Wizard of Oz” study to learn how to design a spoken language interface for programming. In 32nd Annual Frontiers in Education, Vol. 1. IEEE, Boston, MA, USA, T2G–23–T2G–29. https://doi.org/10.1109/FIE.2002.1157953
  44. Byron Reeves and Clifford Nass. 1996. The media equation: How people treat computers, television, and new media like real people. Cambridge, UK 10, 10 (1996).
  45. Laurel Riek. 2012. Wizard of Oz Studies in HRI: A Systematic Review and New Reporting Guidelines. Journal of Human-Robot Interaction (Aug. 2012), 119–136. https://doi.org/10.5898/JHRI.1.1.Riek
  46. Code Llama: Open Foundation Models for Code. http://arxiv.org/abs/2308.12950 arXiv:2308.12950 [cs].
  47. What is it like to program with artificial intelligence? http://arxiv.org/abs/2208.06213 arXiv:2208.06213 [cs].
  48. Lee M. Seversky and Lijun Yin. 2006. Real-time automatic 3D scene generation from natural language voice and text descriptions. In Proceedings of the 14th ACM international conference on Multimedia. ACM, Santa Barbara CA USA, 61–64. https://doi.org/10.1145/1180639.1180660
  49. Eliciting usable gestures for multi-display environments. In Proceedings of the 2012 ACM international conference on Interactive tabletops and surfaces. ACM, Cambridge Massachusetts USA, 41–50. https://doi.org/10.1145/2396636.2396643
  50. Reflexion: language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems. https://openreview.net/forum?id=vAElhFcKW6
  51. Oasis: Procedurally Generated Social Virtual Spaces from 3D Scanned Real Spaces. IEEE Transactions on Visualization and Computer Graphics 24, 12 (Dec. 2018), 3174–3187. https://doi.org/10.1109/TVCG.2017.2762691
  52. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443.
  53. Verbal coordination in first person shooter games. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work. 579–582.
  54. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  55. A Systematic Review of Gesture Elicitation Studies: What Can We Learn from 216 Studies?. In Proceedings of the 2020 ACM Designing Interactive Systems Conference. ACM, Eindhoven Netherlands, 855–872. https://doi.org/10.1145/3357236.3395511
  56. Daniel Vogel and Ravin Balakrishnan. 2005. Distant freehand pointing and clicking on very large, high resolution displays. In Proceedings of the 18th annual ACM symposium on User interface software and technology. ACM, Seattle WA USA, 33–42. https://doi.org/10.1145/1095034.1095041
  57. Eliciting and Analysing Users’ Envisioned Dialogues with Perfect Voice Assistants. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–15. https://doi.org/10.1145/3411764.3445536
  58. A Survey on Large Language Model based Autonomous Agents. (2023). https://doi.org/10.48550/ARXIV.2308.11432 Publisher: arXiv Version Number: 2.
  59. GesturAR: An Authoring System for Creating Freehand Interactive Augmented Reality Applications. In The 34th Annual ACM Symposium on User Interface Software and Technology. ACM, Virtual Event USA, 552–567. https://doi.org/10.1145/3472749.3474769
  60. User-defined gestures for surface computing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Boston MA USA, 1083–1092. https://doi.org/10.1145/1518701.1518866
  61. PromptChainer: Chaining Large Language Model Prompts through Visual Programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM, New Orleans LA USA, 1–10. https://doi.org/10.1145/3491101.3519729
  62. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–22. https://doi.org/10.1145/3491102.3517582
  63. ReAct: Synergizing Reasoning and Acting in Language Models. (2022). https://doi.org/10.48550/ARXIV.2210.03629 Publisher: arXiv Version Number: 3.
  64. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–21. https://doi.org/10.1145/3544548.3581388
  65. Demystifying Practices, Challenges and Expected Features of Using GitHub Copilot. arXiv preprint arXiv:2309.05687 (2023).
  66. ProAgent: Building Proactive Cooperative Agents with Large Language Models. (2023). https://doi.org/10.48550/ARXIV.2308.11339 Publisher: arXiv Version Number: 3.
  67. Lei Zhang and Steve Oney. 2020. FlowMatic: An Immersive Authoring Tool for Creating Interactive Scenes in Virtual Reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. ACM, Virtual Event USA, 342–353. https://doi.org/10.1145/3379337.3415824
  68. Eric Zhou and Dokyun Lee. 2023. Generative ai, human creativity, and art. Available at SSRN (2023).
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets