Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 78 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 23 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 93 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 183 tok/s Pro
2000 character limit reached

LLM-for-X: Application-agnostic Integration of Large Language Models to Support Personal Writing Workflows (2407.21593v1)

Published 31 Jul 2024 in cs.HC

Abstract: To enhance productivity and to streamline workflows, there is a growing trend to embed LLM functionality into applications, from browser-based web apps to native apps that run on personal computers. Here, we introduce LLM-for-X, a system-wide shortcut layer that seamlessly augments any application with LLM services through a lightweight popup dialog. Our native layer seamlessly connects front-end applications to popular LLM backends, such as ChatGPT and Gemini, using their uniform chat front-ends as the programming interface or their custom API calls. We demonstrate the benefits of LLM-for-X across a wide variety of applications, including Microsoft Office, VSCode, and Adobe Acrobat as well as popular web apps such as Overleaf. In our evaluation, we compared LLM-for-X with ChatGPT's web interface in a series of tasks, showing that our approach can provide users with quick, efficient, and easy-to-use LLM assistance without context switching to support writing and reading tasks that is agnostic of the specific application.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Victor Adamson and Johan Bägerfeldt. 2023. Assessing the effectiveness of ChatGPT in generating Python code.
  2. Valentina Alto. 2023. Modern Generative AI with ChatGPT and OpenAI Models: Leverage the capabilities of OpenAI’s LLM for productivity and innovation with GPT3 and GPT4.
  3. Amazon. 2024a. What is Alexa. https://developer.amazon.com/en-US/alexa Accessed: 30-03-2024.
  4. Amazon. 2024b. what is Amazon Lex. https://docs.aws.amazon.com/lex/latest/dg/what-is.html Accessed: 30-03-2024.
  5. anthropic. 2024. Introducing Claude. https://www.anthropic.com/news/introducing-claude Accessed: 30-03-2024.
  6. Apple. 2024. Siri. https://www.apple.com/siri/ Accessed: 30-03-2024.
  7. Predictive text encourages predictable writing. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 128–138.
  8. A Writer’s Collaborative Assistant. International Conference on Intelligent User Interfaces, Proceedings IUI (06 2002). https://doi.org/10.1145/502716.502722
  9. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf 2, 3 (2023), 8.
  10. John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (11 1995).
  11. Generative AI at work. Technical Report. National Bureau of Economic Research.
  12. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  13. Work interrupted: a comparison of workplace interruptions in emergency departments and primary care offices. Annals of emergency medicine 38, 2 (2001), 146–151.
  14. Chrome. 2023. Native messaging. https://developer.chrome.com/docs/extensions/develop/concepts/native-messaging Accessed: 2024-04-03.
  15. ” What can i help you with?” infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th international conference on human-computer interaction with mobile devices and services. 1–12.
  16. Beyond text generation: Supporting writers with continuous automatic text summaries. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–13.
  17. Intelligent personal assistants: A systematic literature review. Expert Systems with Applications 147 (2020), 113193. https://doi.org/10.1016/j.eswa.2020.113193
  18. Google DeepMind. 2024. Welcome to Gemini. https://deepmind.google/technologies/gemini/#introduction Accessed: 30-03-2024.
  19. The Idea Machine: LLM-based Expansion, Rewriting, Combination, and Suggestion of Ideas. In Proceedings of the 14th Conference on Creativity and Cognition (Venice, Italy) (C&C ’22). 623–627. https://doi.org/10.1145/3527927.3535197
  20. Towards next-generation intelligent assistants leveraging llm techniques. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5792–5793.
  21. A survey investigating usage of virtual personal assistants. arXiv preprint arXiv:1807.04606 (2018).
  22. PAL: Program-aided Language Models. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 10764–10799. https://proceedings.mlr.press/v202/gao23f.html
  23. A Design Space for Writing Support Tools Using a Cognitive Process Model of Writing. In Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022), Ting-Hao ’Kenneth’ Huang, Vipul Raheja, Dongyeop Kang, John Joon Young Chung, Daniel Gissin, Mina Lee, and Katy Ilonka Gero (Eds.). Association for Computational Linguistics, Dublin, Ireland, 11–24. https://doi.org/10.18653/v1/2022.in2writing-1.2
  24. Frederic Gmeiner and Nur Yildirim. 2023. Dimensions for Designing LLM-based Writing Support. In In2Writing Workshop at CHI.
  25. You’re the Voice: Evaluating user interfaces for encouraging underserved youths to express themselves through creative writing. In Proceedings of the 2015 ACM SIGCHI Conference on Creativity and Cognition. 63–72.
  26. Google. 2024. Google Workspace. https://workspace.google.com/ Accessed: 30-03-2024.
  27. Grammarly. 2024. Grammarly AI Writing Tools. https://www.grammarly.com/ai-writing-tools Accessed: 30-03-2024.
  28. Simone Grassini. 2023. Shaping the Future of Education: Exploring the Potential and Consequences of AI and ChatGPT in Educational Settings. Education Sciences 13, 7 (2023). https://doi.org/10.3390/educsci13070692
  29. Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload, Peter A. Hancock and Najmedin Meshkati (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9
  30. Supporting complex search tasks. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management. 829–838.
  31. How good are gpt models at machine translation? a comprehensive evaluation. arXiv preprint arXiv:2302.09210 (2023).
  32. Inflection. 2024. The new Inflection. https://inflection.ai/the-new-inflection Accessed: 30-03-2024.
  33. AI-mediated communication: How the perception that profile text was written by AI affects trustworthiness. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.
  34. A comparison of menu selection techniques: touch panel, mouse and keyboard. International Journal of Man-Machine Studies 25, 1 (1986), 73–88. https://doi.org/10.1016/S0020-7373(86)80034-7
  35. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103 (2023), 102274. https://doi.org/10.1016/j.lindif.2023.102274
  36. Large Language Models are Zero-Shot Reasoners. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 22199–22213. https://proceedings.neurips.cc/paper_files/paper/2022/file/8bb0d291acd4acf06ef112099c16f326-Paper-Conference.pdf
  37. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 388, 19 pages. https://doi.org/10.1145/3491102.3502030
  38. Interruptions and task transitions: Understanding their characteristics, processes, and consequences. Academy of Management Annals 14, 2 (2020), 661–694.
  39. James R Lewis. 2018. The system usability scale: past, present, and future. International Journal of Human–Computer Interaction 34, 7 (2018), 577–590.
  40. Chatting with gpt-3 for zero-shot human-like mobile automated gui testing. arXiv preprint arXiv:2305.09434 (2023).
  41. The work life of developers: Activities, switches and perceived productivity. IEEE Transactions on Software Engineering 43, 12 (2017), 1178–1193.
  42. Augmented language models: a survey. arXiv preprint arXiv:2302.07842 (2023).
  43. Microsoft. 2024a. Copilot AI Features. https://www.microsoft.com/en-us/windows/copilot-ai-features Accessed: 30-03-2024.
  44. Microsoft. 2024b. What is Cortana. https://support.microsoft.com/en-us/topic/what-is-cortana-953e648d-5668-e017-1341-7f26f7d0f825 Accessed: 30-03-2024.
  45. Promptaid: Prompt exploration, perturbation, testing and iteration using visual analytics for large language models. arXiv preprint arXiv:2304.01964 (2023).
  46. Collaborative storytelling with large-scale neural language models. In Proceedings of the 13th ACM SIGGRAPH Conference on Motion, Interaction and Games. 1–10.
  47. OpenAI. 2024a. Introducing ChatGPT. https://openai.com/blog/chatgpt Accessed: 30-03-2024.
  48. OpenAI. 2024b. Introducing GPTs. https://openai.com/blog/introducing-gpts Accessed: 30-03-2024.
  49. OpenAI. 2024c. Start Using ChatGPT. https://openai.com/blog/start-using-chatgpt-instantly Accessed: 30-03-2024.
  50. Vishakh Padmakumar and He He. 2022. Machine-in-the-Loop Rewriting for Creative Image Captioning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (Eds.). Association for Computational Linguistics, Seattle, United States, 573–586. https://doi.org/10.18653/v1/2022.naacl-main.42
  51. Filip Radlinski and Nick Craswell. 2017. A theoretical framework for conversational search. In Proceedings of the 2017 conference on conference human information interaction and retrieval. 117–126.
  52. The programmer’s assistant: Conversational interaction with a large language model for software development. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 491–514.
  53. SAGA: Collaborative Storytelling with GPT-3 (CSCW ’21 Companion). Association for Computing Machinery, New York, NY, USA, 163–166. https://doi.org/10.1145/3462204.3481771
  54. Where to hide a stolen elephant: Leaps in creative writing with multimodal machine intelligence. ACM Transactions on Computer-Human Interaction 30, 5 (2023), 1–57.
  55. Snapchat. 2024. What is My AI on Snapchat and how do I use it. https://help.snapchat.com/hc/en-us/articles/13266788358932-What-is-My-AI-on-Snapchat-and-how-do-I-use-it Accessed: 30-03-2024.
  56. Decoding ChatGPT: A taxonomy of existing research, current challenges, and possible future directions. Journal of King Saud University - Computer and Information Sciences 35, 8 (2023), 101675. https://doi.org/10.1016/j.jksuci.2023.101675
  57. Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE transactions on visualization and computer graphics 29, 1 (2022), 1146–1156.
  58. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.
  59. GPTVoiceTasker: LLM-Powered Virtual Assistant for Smartphone. arXiv preprint arXiv:2401.14268 (2024).
  60. Understanding User Experience in Large Language Model Interactions. arXiv:2401.08329 [cs.HC]
  61. Ryen W White. 2023. Navigating complex search tasks with AI copilots. arXiv preprint arXiv:2311.01235 (2023).
  62. Wikipedia. 2024. Virtual Assistants. https://en.wikipedia.org/wiki/Virtual_assistant Accessed: 30-03-2024.
  63. Writeful. 2024. TexGPT: Harness the power of ChatGPT in Overleaf. https://blog.writefull.com/texgpt-harness-the-power-of-chatgpt-in-overleaf/ Accessed: 30-03-2024.
  64. A survey on multimodal large language models. arXiv preprint arXiv:2306.13549 (2023).
  65. Wordcraft: story writing with large language models. In 27th International Conference on Intelligent User Interfaces. 841–852.
  66. Toolqa: A dataset for llm question answering with external tools. Advances in Neural Information Processing Systems 36 (2024).
  67. Work interruptions resiliency: toward an improved understanding of employee efficiency. Journal of Organizational Effectiveness: People and Performance 4, 1 (2017), 39–58.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper presents an application-agnostic interface that reduces editing task times by approximately 40% compared to traditional methods.
  • It employs a combination of OS-level hooks, accessibility APIs, and browser extensions to facilitate seamless LLM query handling and real-time diff view previews.
  • User studies revealed improved usability scores and lower effort ratings, demonstrating significant practical benefits for diverse writing tasks.

The paper introduces a system-wide interface that enables application-agnostic integration of LLMs into virtually any text-based software environment. The approach is realized through an OS-level background service that monitors for global keyboard shortcuts and facilitates interaction with LLM backends via a lightweight pop-up UI overlay. This design avoids the conventional copy–paste paradigm and minimizes context switching, thereby streamlining workflows across native applications and web apps alike.

The system architecture relies on several technical components:

  • OS-Level Background Service:

Implemented in C# using .NET APIs, the service registers a global keyboard hook to detect shortcut triggers. It leverages the Windows UI Automation API (UIA) to extract selected text and additional contextual information from foreground applications. In cases where UIA is unavailable, a clipboard fallback mechanism is employed.

  • Native and Web-Based Interfaces:

For native applications, the integration is achieved via accessibility APIs, where the service retrieves the current window’s properties (such as window title and Process ID) as contextual cues to augment the LLM prompt. For web applications, a dedicated browser extension uses DOM APIs to capture text selections and related content. Communication between the browser extension and the native service is handled via the Native Messaging API, ensuring a seamless user experience.

  • LLM Query Handling:

The system supports both interaction via emulated chat interfaces (mimicking standard LLM web UIs like those of ChatGPT, Gemini, etc.) and direct API calls to the LLM backends. In the emulated mode, the extension simulates user input and polls the DOM for the progressive appearance of the LLM’s response. The retrieved response is then previewed and can be inserted directly into the originating application through simulated keystrokes, preserving the contextual integrity of the text input (e.g., ensuring that the action is reflected in the application’s native undo/redo stack).

  • Interaction Design and Features:
    • Predefined Commands and Custom Query Input: Users can trigger standard actions (e.g., “fix spelling mistakes”, “explain”, “translate”) with numeric shortcuts, or type any specific query.
    • Diff View Previews and Iterative Refinement: For editing tasks, the system presents a side-by-side diff view that highlights changes, enabling users to refine prompts iteratively while maintaining conversational context.
    • Direct Insertion Versus Replacement: Depending on modifier keys (such as SHIFT), the system distinguishes between replacing selected text or appending the LLM response below the current selection.
    • Contextual Augmentation: The prompt sent to the LLM is padded with additional context that includes the application name and window title, along with surrounding text if configured by the user. This enhances the LLM’s capacity to generate contextually relevant responses.

The evaluation comprises a controlled user paper with 14 participants who performed writing, reading, and coding tasks using both the proposed system and the standard ChatGPT web interface. Key quantitative findings include:

  • Editing Task Performance:

A statistically significant reduction in task completion times was observed during editing tasks, with an average of 31.71 seconds using this system versus 51.14 seconds via ChatGPT (p < 0.05). This constitutes an approximate 40% speed improvement.

  • Usability and Effort Metrics:

The system’s usability was rated significantly higher on the System Usability Scale (SUS), with average scores of 62.54 compared to 51.68 for ChatGPT. Additionally, participants reported lower effort scores on the NASA Task Load Index (TLX), particularly in terms of ease of use.

  • Qualitative Feedback:

Users appreciated the elimination of context switching and the efficiency of keyboard shortcuts. Some users, however, noted that the familiar conversational style of ChatGPT provided a friendlier experience, indicating room for integrating more personalized features without compromising efficiency.

The paper also details the implementation nuances, such as simulating keypress events (using functions like SendKeys.SendWait) to ensure that LLM responses integrate natively into the target application’s editing flow. Moreover, the system is designed to recognize when a target element is non-editable and adapts its UI (by, for instance, hiding the TAB button) accordingly.

In summary, the paper demonstrates that the proposed system can significantly enhance text manipulation tasks by providing efficient in-situ LLM assistance across diverse applications. The technical contributions of integrating accessibility APIs, browser extensions, and LLM backends via both emulated interactions and direct API calls together form a robust framework for extending LLM services without the need for application-specific subscriptions. The work lays a solid foundation for further enhancements, such as incorporating multimodal context or personalized interaction cues, to further streamline user workflows in varied computing environments.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com