DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models
Abstract: We characterize and demonstrate how the principles of direct manipulation can improve interaction with LLMs. This includes: continuous representation of generated objects of interest; reuse of prompt syntax in a toolbar of commands; manipulable outputs to compose or control the effect of prompts; and undo mechanisms. This idea is exemplified in DirectGPT, a user interface layer on top of ChatGPT that works by transforming direct manipulation actions to engineered prompts. A study shows participants were 50% faster and relied on 50% fewer and 72% shorter prompts to edit text, code, and vector images compared to baseline ChatGPT. Our work contributes a validated approach to integrate LLMs into traditional software using direct manipulation. Data, code, and demo available at https://osf.io/3wt6s.
- CommandSpace: Modeling the Relationships between Tasks, Descriptions and Features. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (UIST ’14). Association for Computing Machinery, New York, NY, USA, 167–176. https://doi.org/10.1145/2642918.2647395
- Spellburst: A Node-based Interface for Exploratory Creative Coding with Natural Language Prompts. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–22. https://doi.org/10.1145/3586183.3606719
- ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing. arXiv:2309.09128 [cs.HC]
- An Empirical Evaluation of the System Usability Scale. International Journal of Human–Computer Interaction 24, 6 (July 2008), 574–594. https://doi.org/10.1080/10447310802205776
- Michel Beaudouin-Lafon. 2000. Instrumental Interaction: An Interaction Model for Designing Post-WIMP User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’00). Association for Computing Machinery, New York, NY, USA, 446–453. https://doi.org/10.1145/332040.332473
- Michel Beaudouin-Lafon. 2004. Designing Interaction, Not Interfaces. In Proceedings of the Working Conference on Advanced Visual Interfaces (AVI ’04). Association for Computing Machinery, New York, NY, USA, 15–22. https://doi.org/10.1145/989863.989865
- Michel Beaudouin-Lafon and Wendy E. Mackay. 2000. Reification, Polymorphism and Reuse: Three Principles for Designing Visual Interfaces. In Proceedings of the Working Conference on Advanced Visual Interfaces (AVI ’00). Association for Computing Machinery, New York, NY, USA, 102–109. https://doi.org/10.1145/345513.345267
- Richard A. Bolt. 1980. “Put-that-there”: Voice and Gesture at the Graphics Interface. ACM SIGGRAPH Computer Graphics 14, 3 (July 1980), 262–270. https://doi.org/10.1145/965105.807503
- Patrick D Bridge and Shlomo S Sawilowsky. 1999. Increasing Physicians’ Awareness of the Impact of Statistics on Research Outcomes: Comparative Power of the t-Test and Wilcoxon Rank-Sum Test in Small Samples Applied Research. Journal of Clinical Epidemiology 52, 3 (March 1999), 229–235. https://doi.org/10.1016/S0895-4356(98)00168-1
- John Brooke. 1995. SUS: A Quick and Dirty Usability Scale. Usability Eval. Ind. 189 (Nov. 1995), 7.
- Next Steps for Human-Centered Generative AI: A Technical Perspective. arXiv:2306.15774 [cs]
- John Joon Young Chung and Eytan Adar. 2023. PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–17. https://doi.org/10.1145/3586183.3606777
- TaleBrush: Sketching Stories with Generative Pretrained Language Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–19. https://doi.org/10.1145/3491102.3501819
- Creative Writing with a Machine in the Loop: Case Studies on Slogans and Stories. In 23rd International Conference on Intelligent User Interfaces (IUI ’18). Association for Computing Machinery, New York, NY, USA, 329–340. https://doi.org/10.1145/3172944.3172983
- Supporting Novice to Expert Transitions in User Interfaces. Comput. Surveys 47, 2 (Jan. 2015), 1–36. https://doi.org/10.1145/2659796
- Richard L. Daft and Robert H. Lengel. 1986. Organizational Information Requirements, Media Richness and Structural Design. Management Science 32, 5 (1986), 554–571. arXiv:2631846
- Choice Over Control: How Users Write with Large Language Models Using Diegetic and Non-Diegetic Prompting. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–17. https://doi.org/10.1145/3544548.3580969
- GANSlider: How Users Control Generative Models for Images Using Multiple Sliders with and without Feedforward Information. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3491102.3502141
- How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. arXiv:2209.01390 [cs]
- Sjaak de Mul and Herre van Oostendorp. 1996. Learning User Interfaces by Exploration. Acta Psychologica 91, 3 (April 1996), 325–344. https://doi.org/10.1016/0001-6918(95)00060-7
- Pierre Dragicevic. 2016. Fair Statistical Communication in HCI. In Modern Statistical Methods for HCI, Judy Robertson and Maurits Kaptein (Eds.). Springer International Publishing, Cham, 291–330. https://doi.org/10.1007/978-3-319-26633-6_13
- Stephen W. Draper and Stephen B. Barton. 1993. Learning by Exploration and Affordance Bugs. In INTERACT ’93 and CHI ’93 Conference Companion on Human Factors in Computing Systems (CHI ’93). Association for Computing Machinery, New York, NY, USA, 75–76. https://doi.org/10.1145/259964.260084
- Query-Feature Graphs: Bridging User Vocabulary and System Functionality. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology. ACM, Santa Barbara California USA, 207–216. https://doi.org/10.1145/2047196.2047224
- Camille Gobert and Michel Beaudouin-Lafon. 2023. Lorgnette: Creating Malleable Code Projections. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. ACM, San Francisco CA USA, 1–16. https://doi.org/10.1145/3586183.3606817
- Google. 2023. Bard - Chat Based AI Tool from Google, Powered by PaLM 2. https://bard.google.com.
- Transparent Statistics in Human-Computer Interaction Working Group. 2019. Transparent Statistics Guidelines. https://doi.org/10.5281/zenodo.1186169
- Friend, Collaborator, Student, Manager: How Design of an AI-Driven Game Level Editor Affects Creators. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300854
- A. G. Hauptmann. 1989. Speech and Gestures for Graphic Image Manipulation. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’89). Association for Computing Machinery, New York, NY, USA, 241–245. https://doi.org/10.1145/67449.67496
- Jeffrey Heer. 2019. Agency plus Automation: Designing Artificial Intelligence into Interactive Systems. Proceedings of the National Academy of Sciences 116, 6 (Feb. 2019), 1844–1850. https://doi.org/10.1073/pnas.1807184115
- Direct Manipulation Interfaces. Human–Computer Interaction 1, 4 (Dec. 1985), 311–338. https://doi.org/10.1207/s15327051hci0104_2
- GenLine and GenForm: Two Tools for Interacting with Generative Language Models in a Code Editor. In Adjunct Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology (UIST ’21 Adjunct). Association for Computing Machinery, New York, NY, USA, 145–147. https://doi.org/10.1145/3474349.3480209
- Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–20. https://doi.org/10.1145/3586183.3606737
- Stylette: Styling the Web with Natural Language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–17. https://doi.org/10.1145/3491102.3501931
- Combining Deictic Gestures and Natural Language for Referent Identification. In Proceedings of the 11th Coference on Computational Linguistics (Bonn, Germany) (COLING ’86). Association for Computational Linguistics, USA, 356–361. https://doi.org/10.3115/991365.991471
- PixelTone: A Multimodal Interface for Image Editing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Paris France, 2185–2194. https://doi.org/10.1145/2470654.2481301
- Suggestion Lists vs. Continuous Generation: Interaction Design for Writing with Generative Models on Mobile Devices Affect Text Length, Wording and Perceived Authorship. In Proceedings of Mensch Und Computer 2022 (MuC ’22). Association for Computing Machinery, New York, NY, USA, 192–208. https://doi.org/10.1145/3543758.3543947
- Robert H. Lengel and Richard L. Daft. 1988. The Selection of Communication Media as an Executive Skill. The Academy of Management Executive (1987-1989) 2, 3 (1988), 225–232. arXiv:4164833
- Fast Inference from Transformers via Speculative Decoding. https://doi.org/10.48550/arXiv.2211.17192 arXiv:2211.17192 [cs]
- PUMICE: A Multi-Modal Agent That Learns Concepts and Conditionals from Natural Language and Demonstrations. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (UIST ’19). Association for Computing Machinery, New York, NY, USA, 577–589. https://doi.org/10.1145/3332165.3347899
- “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–31. https://doi.org/10.1145/3544548.3580817
- Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv:2107.13586 [cs]
- InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language. arXiv:2305.05662 [cs]
- Novice-AI Music Co-Creation via AI-Steering Tools for Deep Generative Models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376739
- Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. arXiv:2104.08786 [cs]
- Aran Lunzer and Kasper Hornbæk. 2008. Subjunctive Interfaces: Extending Applications to Support Parallel Setup, Viewing and Control of Alternative Scenarios. ACM Transactions on Computer-Human Interaction 14, 4 (Jan. 2008), 1–44. https://doi.org/10.1145/1314683.1314685
- Pattie Maes. 1994. Agents That Reduce Work and Information Overload. Commun. ACM 37, 7 (July 1994), 30–40. https://doi.org/10.1145/176789.176792
- Statslator: Interactive Translation of NHST and Estimation Statistics Reporting Styles in Scientific Documents. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3586183.3606762
- Supercharging Trial-and-Error for Learning Complex Software Applications. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–13. https://doi.org/10.1145/3491102.3501895
- Microsoft. 2023. Introducing the New Bing. https://www.bing.com/new.
- Dan Milmo. 2023. ChatGPT Reaches 100 Million Users Two Months after Launch. The Guardian (Feb. 2023), 1.
- PromptAid: Prompt Exploration, Perturbation, Testing and Iteration Using Visual Analytics for Large Language Models. arXiv:2304.01964 [cs]
- B.A. Myers. 1992. Demonstrational Interfaces: A Step beyond Direct Manipulation. Computer 25, 8 (Aug. 1992), 61–73. https://doi.org/10.1109/2.153286
- Mathieu Nancel and Andy Cockburn. 2014. Causality: A Conceptual Model of Interaction History. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Toronto Ontario Canada, 1777–1786. https://doi.org/10.1145/2556288.2556990
- The Micro-Structure of Use of Help. In Proceedings of the 27th ACM International Conference on Design of Communication (SIGDOC ’09). Association for Computing Machinery, New York, NY, USA, 97–104. https://doi.org/10.1145/1621995.1622014
- I Lead, You Help but Only with Enough Details: Understanding User Experience of Co-Creation with Artificial Intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174223
- OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt.
- OpenAI. 2023a. GPT Best Practices. https://platform.openai.com/docs/guides/gpt-best-practices.
- OpenAI. 2023b. OpenAI Node API Library. OpenAI.
- OpenAI. 2023c. OpenAI Platform. https://platform.openai.com/docs/api-reference/chat/object.
- Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold. In ACM SIGGRAPH 2023 Conference Proceedings (SIGGRAPH ’23). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3588432.3591500
- Prism.js 2015. Prism. Prism.js. https://prismjs.com
- Jef Raskin. 2000. The Humane Interface: New Directions for Designing Interactive Systems (1st edition ed.). Addison-Wesley Professional, Reading, Mass.
- Meta 2013. React – A JavaScript Library for Building User Interfaces. Meta. https://reactjs.org/
- Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–7. https://doi.org/10.1145/3411763.3451760
- Evaluating the Interpretability of Generative Models by Interactive Reconstruction. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–15. https://doi.org/10.1145/3411764.3445296
- bashtage/arch: Release 6.1.0. https://doi.org/10.5281/zenodo.7975104
- Shneiderman. 1983. Direct Manipulation: A Step Beyond Programming Languages. Computer 16, 8 (Aug. 1983), 57–69. https://doi.org/10.1109/MC.1983.1654471
- Ben Shneiderman. 1982. The Future of Interactive Systems and the Emergence of Direct Manipulation†. Behaviour & Information Technology 1, 3 (July 1982), 237–256. https://doi.org/10.1080/01449298208914450
- B. Shneiderman. 1993. Beyond Intelligent Machines: Just Do It. IEEE Software 10, 1 (Jan. 1993), 100–103. https://doi.org/10.1109/52.207235
- Ben Shneiderman. 1997. Direct Manipulation for Comprehensible, Predictable and Controllable User Interfaces. In Proceedings of the 2nd International Conference on Intelligent User Interfaces - IUI ’97. ACM Press, Orlando, Florida, United States, 33–39. https://doi.org/10.1145/238218.238281
- Ben Shneiderman and Pattie Maes. 1997. Direct Manipulation vs. Interface Agents. Interactions 4, 6 (Nov. 1997), 42–61. https://doi.org/10.1145/267505.267514
- Designing the User Interface: Strategies for Effective Human-Computer Interaction (sixth edition ed.). Pearson, Boston.
- Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. arXiv:2208.07852 [cs]
- Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/3586183.3606756
- Variation in Element and Action: Supporting Simultaneous Development of Alternative Solutions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Vienna Austria, 711–718. https://doi.org/10.1145/985692.985782
- Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM, New Orleans LA USA, 1–7. https://doi.org/10.1145/3491101.3519665
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2
- Improving GAN Equilibrium by Raising Spatial Awareness. https://doi.org/10.48550/arXiv.2112.00718 arXiv:2112.00718 [cs]
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems 35 (Dec. 2022), 24824–24837.
- Catherine G. Wolf and James R. Rhyne. 1987. A Taxonomic Approach to Understanding Direct Manipulation. Proceedings of the Human Factors Society Annual Meeting 31, 5 (Sept. 1987), 576–580. https://doi.org/10.1177/154193128703100522
- PromptChainer: Chaining Large Language Model Prompts through Visual Programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM, New Orleans LA USA, 1–10. https://doi.org/10.1145/3491101.3519729
- AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–22. https://doi.org/10.1145/3491102.3517582
- Sikuli: Using GUI Screenshots for Search and Automation. In Proceedings of the 22nd Annual ACM Symposium on User Interface Software and Technology (UIST ’09). Association for Computing Machinery, New York, NY, USA, 183–192. https://doi.org/10.1145/1622176.1622213
- Generative Image Inpainting with Contextual Attention. arXiv:1801.07892 [cs]
- Wordcraft: Story Writing With Large Language Models. In 27th International Conference on Intelligent User Interfaces. ACM, Helsinki Finland, 841–852. https://doi.org/10.1145/3490099.3511105
- Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM, Hamburg Germany, 1–21. https://doi.org/10.1145/3544548.3581388
- Yaqian Zhu and John Kolassa. 2018. Assessing and Comparing the Accuracy of Various Bootstrap Methods. Communications in Statistics - Simulation and Computation 47, 8 (Sept. 2018), 2436–2453. https://doi.org/10.1080/03610918.2017.1348516
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.