Papers
Topics
Authors
Recent
Search
2000 character limit reached

DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models

Published 5 Oct 2023 in cs.HC | (2310.03691v2)

Abstract: We characterize and demonstrate how the principles of direct manipulation can improve interaction with LLMs. This includes: continuous representation of generated objects of interest; reuse of prompt syntax in a toolbar of commands; manipulable outputs to compose or control the effect of prompts; and undo mechanisms. This idea is exemplified in DirectGPT, a user interface layer on top of ChatGPT that works by transforming direct manipulation actions to engineered prompts. A study shows participants were 50% faster and relied on 50% fewer and 72% shorter prompts to edit text, code, and vector images compared to baseline ChatGPT. Our work contributes a validated approach to integrate LLMs into traditional software using direct manipulation. Data, code, and demo available at https://osf.io/3wt6s.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (87)
  1. CommandSpace: Modeling the Relationships between Tasks, Descriptions and Features. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (UIST ’14). Association for Computing Machinery, New York, NY, USA, 167–176. https://doi.org/10.1145/2642918.2647395
  2. Spellburst: A Node-based Interface for Exploratory Creative Coding with Natural Language Prompts. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–22. https://doi.org/10.1145/3586183.3606719
  3. ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing. arXiv:2309.09128 [cs.HC]
  4. An Empirical Evaluation of the System Usability Scale. International Journal of Human–Computer Interaction 24, 6 (July 2008), 574–594. https://doi.org/10.1080/10447310802205776
  5. Michel Beaudouin-Lafon. 2000. Instrumental Interaction: An Interaction Model for Designing Post-WIMP User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’00). Association for Computing Machinery, New York, NY, USA, 446–453. https://doi.org/10.1145/332040.332473
  6. Michel Beaudouin-Lafon. 2004. Designing Interaction, Not Interfaces. In Proceedings of the Working Conference on Advanced Visual Interfaces (AVI ’04). Association for Computing Machinery, New York, NY, USA, 15–22. https://doi.org/10.1145/989863.989865
  7. Michel Beaudouin-Lafon and Wendy E. Mackay. 2000. Reification, Polymorphism and Reuse: Three Principles for Designing Visual Interfaces. In Proceedings of the Working Conference on Advanced Visual Interfaces (AVI ’00). Association for Computing Machinery, New York, NY, USA, 102–109. https://doi.org/10.1145/345513.345267
  8. Richard A. Bolt. 1980. “Put-that-there”: Voice and Gesture at the Graphics Interface. ACM SIGGRAPH Computer Graphics 14, 3 (July 1980), 262–270. https://doi.org/10.1145/965105.807503
  9. Patrick D Bridge and Shlomo S Sawilowsky. 1999. Increasing Physicians’ Awareness of the Impact of Statistics on Research Outcomes: Comparative Power of the t-Test and Wilcoxon Rank-Sum Test in Small Samples Applied Research. Journal of Clinical Epidemiology 52, 3 (March 1999), 229–235. https://doi.org/10.1016/S0895-4356(98)00168-1
  10. John Brooke. 1995. SUS: A Quick and Dirty Usability Scale. Usability Eval. Ind. 189 (Nov. 1995), 7.
  11. Next Steps for Human-Centered Generative AI: A Technical Perspective. arXiv:2306.15774 [cs]
  12. John Joon Young Chung and Eytan Adar. 2023. PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–17. https://doi.org/10.1145/3586183.3606777
  13. TaleBrush: Sketching Stories with Generative Pretrained Language Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–19. https://doi.org/10.1145/3491102.3501819
  14. Creative Writing with a Machine in the Loop: Case Studies on Slogans and Stories. In 23rd International Conference on Intelligent User Interfaces (IUI ’18). Association for Computing Machinery, New York, NY, USA, 329–340. https://doi.org/10.1145/3172944.3172983
  15. Supporting Novice to Expert Transitions in User Interfaces. Comput. Surveys 47, 2 (Jan. 2015), 1–36. https://doi.org/10.1145/2659796
  16. Richard L. Daft and Robert H. Lengel. 1986. Organizational Information Requirements, Media Richness and Structural Design. Management Science 32, 5 (1986), 554–571. arXiv:2631846
  17. Choice Over Control: How Users Write with Large Language Models Using Diegetic and Non-Diegetic Prompting. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–17. https://doi.org/10.1145/3544548.3580969
  18. GANSlider: How Users Control Generative Models for Images Using Multiple Sliders with and without Feedforward Information. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3491102.3502141
  19. How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. arXiv:2209.01390 [cs]
  20. Sjaak de Mul and Herre van Oostendorp. 1996. Learning User Interfaces by Exploration. Acta Psychologica 91, 3 (April 1996), 325–344. https://doi.org/10.1016/0001-6918(95)00060-7
  21. Pierre Dragicevic. 2016. Fair Statistical Communication in HCI. In Modern Statistical Methods for HCI, Judy Robertson and Maurits Kaptein (Eds.). Springer International Publishing, Cham, 291–330. https://doi.org/10.1007/978-3-319-26633-6_13
  22. Stephen W. Draper and Stephen B. Barton. 1993. Learning by Exploration and Affordance Bugs. In INTERACT ’93 and CHI ’93 Conference Companion on Human Factors in Computing Systems (CHI ’93). Association for Computing Machinery, New York, NY, USA, 75–76. https://doi.org/10.1145/259964.260084
  23. Query-Feature Graphs: Bridging User Vocabulary and System Functionality. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. ACM, Santa Barbara California USA, 207–216. https://doi.org/10.1145/2047196.2047224
  24. Camille Gobert and Michel Beaudouin-Lafon. 2023. Lorgnette: Creating Malleable Code Projections. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco CA USA, 1–16. https://doi.org/10.1145/3586183.3606817
  25. Google. 2023. Bard - Chat Based AI Tool from Google, Powered by PaLM 2. https://bard.google.com.
  26. Transparent Statistics in Human-Computer Interaction Working Group. 2019. Transparent Statistics Guidelines. https://doi.org/10.5281/zenodo.1186169
  27. Friend, Collaborator, Student, Manager: How Design of an AI-Driven Game Level Editor Affects Creators. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300854
  28. A. G. Hauptmann. 1989. Speech and Gestures for Graphic Image Manipulation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’89). Association for Computing Machinery, New York, NY, USA, 241–245. https://doi.org/10.1145/67449.67496
  29. Jeffrey Heer. 2019. Agency plus Automation: Designing Artificial Intelligence into Interactive Systems. Proceedings of the National Academy of Sciences 116, 6 (Feb. 2019), 1844–1850. https://doi.org/10.1073/pnas.1807184115
  30. Direct Manipulation Interfaces. Human–Computer Interaction 1, 4 (Dec. 1985), 311–338. https://doi.org/10.1207/s15327051hci0104_2
  31. GenLine and GenForm: Two Tools for Interacting with Generative Language Models in a Code Editor. In Adjunct Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology (UIST ’21 Adjunct). Association for Computing Machinery, New York, NY, USA, 145–147. https://doi.org/10.1145/3474349.3480209
  32. Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–20. https://doi.org/10.1145/3586183.3606737
  33. Stylette: Styling the Web with Natural Language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–17. https://doi.org/10.1145/3491102.3501931
  34. Combining Deictic Gestures and Natural Language for Referent Identification. In Proceedings of the 11th Coference on Computational Linguistics (Bonn, Germany) (COLING ’86). Association for Computational Linguistics, USA, 356–361. https://doi.org/10.3115/991365.991471
  35. PixelTone: A Multimodal Interface for Image Editing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Paris France, 2185–2194. https://doi.org/10.1145/2470654.2481301
  36. Suggestion Lists vs. Continuous Generation: Interaction Design for Writing with Generative Models on Mobile Devices Affect Text Length, Wording and Perceived Authorship. In Proceedings of Mensch Und Computer 2022 (MuC ’22). Association for Computing Machinery, New York, NY, USA, 192–208. https://doi.org/10.1145/3543758.3543947
  37. Robert H. Lengel and Richard L. Daft. 1988. The Selection of Communication Media as an Executive Skill. The Academy of Management Executive (1987-1989) 2, 3 (1988), 225–232. arXiv:4164833
  38. Fast Inference from Transformers via Speculative Decoding. https://doi.org/10.48550/arXiv.2211.17192 arXiv:2211.17192 [cs]
  39. PUMICE: A Multi-Modal Agent That Learns Concepts and Conditionals from Natural Language and Demonstrations. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (UIST ’19). Association for Computing Machinery, New York, NY, USA, 577–589. https://doi.org/10.1145/3332165.3347899
  40. “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–31. https://doi.org/10.1145/3544548.3580817
  41. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv:2107.13586 [cs]
  42. InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language. arXiv:2305.05662 [cs]
  43. Novice-AI Music Co-Creation via AI-Steering Tools for Deep Generative Models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376739
  44. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. arXiv:2104.08786 [cs]
  45. Aran Lunzer and Kasper Hornbæk. 2008. Subjunctive Interfaces: Extending Applications to Support Parallel Setup, Viewing and Control of Alternative Scenarios. ACM Transactions on Computer-Human Interaction 14, 4 (Jan. 2008), 1–44. https://doi.org/10.1145/1314683.1314685
  46. Pattie Maes. 1994. Agents That Reduce Work and Information Overload. Commun. ACM 37, 7 (July 1994), 30–40. https://doi.org/10.1145/176789.176792
  47. Statslator: Interactive Translation of NHST and Estimation Statistics Reporting Styles in Scientific Documents. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3586183.3606762
  48. Supercharging Trial-and-Error for Learning Complex Software Applications. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–13. https://doi.org/10.1145/3491102.3501895
  49. Microsoft. 2023. Introducing the New Bing. https://www.bing.com/new.
  50. Dan Milmo. 2023. ChatGPT Reaches 100 Million Users Two Months after Launch. The Guardian (Feb. 2023), 1.
  51. PromptAid: Prompt Exploration, Perturbation, Testing and Iteration Using Visual Analytics for Large Language Models. arXiv:2304.01964 [cs]
  52. B.A. Myers. 1992. Demonstrational Interfaces: A Step beyond Direct Manipulation. Computer 25, 8 (Aug. 1992), 61–73. https://doi.org/10.1109/2.153286
  53. Mathieu Nancel and Andy Cockburn. 2014. Causality: A Conceptual Model of Interaction History. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Toronto Ontario Canada, 1777–1786. https://doi.org/10.1145/2556288.2556990
  54. The Micro-Structure of Use of Help. In Proceedings of the 27th ACM International Conference on Design of Communication (SIGDOC ’09). Association for Computing Machinery, New York, NY, USA, 97–104. https://doi.org/10.1145/1621995.1622014
  55. I Lead, You Help but Only with Enough Details: Understanding User Experience of Co-Creation with Artificial Intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174223
  56. OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt.
  57. OpenAI. 2023a. GPT Best Practices. https://platform.openai.com/docs/guides/gpt-best-practices.
  58. OpenAI. 2023b. OpenAI Node API Library. OpenAI.
  59. OpenAI. 2023c. OpenAI Platform. https://platform.openai.com/docs/api-reference/chat/object.
  60. Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold. In ACM SIGGRAPH 2023 Conference Proceedings (SIGGRAPH ’23). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3588432.3591500
  61. Prism.js 2015. Prism. Prism.js. https://prismjs.com
  62. Jef Raskin. 2000. The Humane Interface: New Directions for Designing Interactive Systems (1st edition ed.). Addison-Wesley Professional, Reading, Mass.
  63. Meta 2013. React – A JavaScript Library for Building User Interfaces. Meta. https://reactjs.org/
  64. Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–7. https://doi.org/10.1145/3411763.3451760
  65. Evaluating the Interpretability of Generative Models by Interactive Reconstruction. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–15. https://doi.org/10.1145/3411764.3445296
  66. bashtage/arch: Release 6.1.0. https://doi.org/10.5281/zenodo.7975104
  67. Shneiderman. 1983. Direct Manipulation: A Step Beyond Programming Languages. Computer 16, 8 (Aug. 1983), 57–69. https://doi.org/10.1109/MC.1983.1654471
  68. Ben Shneiderman. 1982. The Future of Interactive Systems and the Emergence of Direct Manipulation†. Behaviour & Information Technology 1, 3 (July 1982), 237–256. https://doi.org/10.1080/01449298208914450
  69. B. Shneiderman. 1993. Beyond Intelligent Machines: Just Do It. IEEE Software 10, 1 (Jan. 1993), 100–103. https://doi.org/10.1109/52.207235
  70. Ben Shneiderman. 1997. Direct Manipulation for Comprehensible, Predictable and Controllable User Interfaces. In Proceedings of the 2nd International Conference on Intelligent User Interfaces - IUI ’97. ACM Press, Orlando, Florida, United States, 33–39. https://doi.org/10.1145/238218.238281
  71. Ben Shneiderman and Pattie Maes. 1997. Direct Manipulation vs. Interface Agents. Interactions 4, 6 (Nov. 1997), 42–61. https://doi.org/10.1145/267505.267514
  72. Designing the User Interface: Strategies for Effective Human-Computer Interaction (sixth edition ed.). Pearson, Boston.
  73. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. arXiv:2208.07852 [cs]
  74. Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/3586183.3606756
  75. Variation in Element and Action: Supporting Simultaneous Development of Alternative Solutions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Vienna Austria, 711–718. https://doi.org/10.1145/985692.985782
  76. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM, New Orleans LA USA, 1–7. https://doi.org/10.1145/3491101.3519665
  77. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2
  78. Improving GAN Equilibrium by Raising Spatial Awareness. https://doi.org/10.48550/arXiv.2112.00718 arXiv:2112.00718 [cs]
  79. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems 35 (Dec. 2022), 24824–24837.
  80. Catherine G. Wolf and James R. Rhyne. 1987. A Taxonomic Approach to Understanding Direct Manipulation. Proceedings of the Human Factors Society Annual Meeting 31, 5 (Sept. 1987), 576–580. https://doi.org/10.1177/154193128703100522
  81. PromptChainer: Chaining Large Language Model Prompts through Visual Programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM, New Orleans LA USA, 1–10. https://doi.org/10.1145/3491101.3519729
  82. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–22. https://doi.org/10.1145/3491102.3517582
  83. Sikuli: Using GUI Screenshots for Search and Automation. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology (UIST ’09). Association for Computing Machinery, New York, NY, USA, 183–192. https://doi.org/10.1145/1622176.1622213
  84. Generative Image Inpainting with Contextual Attention. arXiv:1801.07892 [cs]
  85. Wordcraft: Story Writing With Large Language Models. In 27th International Conference on Intelligent User Interfaces. ACM, Helsinki Finland, 841–852. https://doi.org/10.1145/3490099.3511105
  86. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–21. https://doi.org/10.1145/3544548.3581388
  87. Yaqian Zhu and John Kolassa. 2018. Assessing and Comparing the Accuracy of Various Bootstrap Methods. Communications in Statistics - Simulation and Computation 47, 8 (Sept. 2018), 2436–2453. https://doi.org/10.1080/03610918.2017.1348516
Citations (11)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.