Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 168 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 130 tok/s Pro
Kimi K2 170 tok/s Pro
GPT OSS 120B 437 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation (2401.10838v2)

Published 19 Jan 2024 in cs.HC

Abstract: Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneous spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Generative Models can Help Writers without Writing for Them.. In IUI Workshops.
  2. From Tool to Companion: Storywriters Want AI Writers to Respect Their Personal Values and Writing Strategies. Proceedings of the 2022 ACM Designing Interactive Systems Conference (2022). https://api.semanticscholar.org/CorpusID:249578888
  3. Service blueprinting: a practical technique for service innovation. California management review 50, 3 (2008), 66–94.
  4. Beyond Text Generation: Supporting Writers with Continuous Automatic Text Summaries. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 98, 13 pages. https://doi.org/10.1145/3526113.3545672
  5. Just Speak It: Minimize Cognitive Load for Eyes-Free Text Editing with a Smart Voice Assistant. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 910–921. https://doi.org/10.1145/3472749.3474795
  6. Sparks: Inspiration for Science Writing Using Language Models. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (Virtual Event, Australia) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 1002–1019. https://doi.org/10.1145/3532106.3533533
  7. Commanding and Re-Dictation: Developing Eyes-Free Voice-Based Interaction for Editing Dictated Text. ACM Trans. Comput.-Hum. Interact. 27, 4, Article 28 (aug 2020), 31 pages. https://doi.org/10.1145/3390889
  8. Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers. ArXiv abs/2211.05030 (2022). https://api.semanticscholar.org/CorpusID:253420678
  9. Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. arXiv preprint arXiv:2305.11473 (2023).
  10. Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 568–575. https://doi.org/10.1145/302979.303160
  11. Toward Interactive Dictation. arXiv:2307.04008 [cs.CL]
  12. Hierarchical Summarization for Longform Spoken Dialog. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 582–597. https://doi.org/10.1145/3472749.3474771
  13. Improving Automatic Summarization for Browsing Longform Spoken Dialog. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
  14. Improving readability for automatic speech recognition transcription. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 5 (2023), 1–23.
  15. Typist Experiment: An Investigation of Human-to-Human Dictation via Role-Play to Inform Voice-Based Text Authoring. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 338 (nov 2022), 33 pages. https://doi.org/10.1145/3555758
  16. Paria Jamshid Lou and Mark Johnson. 2020. End-to-End Speech Recognition and Disfluency Removal. In Findings of the Association for Computational Linguistics: EMNLP 2020. 2051–2061.
  17. Gist and Verbatim: Understanding Speech to Inform New Interfaces for Verbal Text Composition. In Proceedings of the 5th International Conference on Conversational User Interfaces (Eindhoven, Netherlands) (CUI ’23). Association for Computing Machinery, New York, NY, USA, Article 15, 11 pages. https://doi.org/10.1145/3571884.3597134
  18. Bill Moggridge. 2006. Designing Interactions. The MIT Press.
  19. Revolution or Evolution? Speech Interaction and HCI Design Guidelines. IEEE Pervasive Computing 18, 2 (2019), 33–45. https://doi.org/10.1109/MPRV.2019.2906991
  20. Ken Perlin and David Fox. 1993. Pad: An Alternative Approach to the Computer Interface. In Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques (Anaheim, CA) (SIGGRAPH ’93). Association for Computing Machinery, New York, NY, USA, 57–64. https://doi.org/10.1145/166117.166125
  21. ChatGPT and Academic Research: A Review and Recommendations Based on Practical Examples. Journal of Education, Management and Development Studies (2023). https://api.semanticscholar.org/CorpusID:257845986
  22. Automatic Keyword Extraction from Individual Documents. John Wiley & Sons, Ltd, Chapter 1, 1–20. https://doi.org/10.1002/9780470689646.ch1 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470689646.ch1
  23. Comparing Speech and Keyboard Text Entry for Short Messages in Two Languages on Touchscreen Phones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4, Article 159 (jan 2018), 23 pages. https://doi.org/10.1145/3161187
  24. Digital support for academic writing: A review of technologies and pedagogies. Computers & education 131 (2019), 33–48.
  25. Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models. arXiv preprint arXiv:2305.11483 (2023).
  26. Neural Error Corrective Language Models for Automatic Speech Recognition.. In INTERSPEECH. 401–405.
  27. Daniel Vogel and Patrick Baudisch. 2007. Shift: a technique for operating pen-based interfaces using touch. In Proceedings of the SIGCHI conference on Human factors in computing systems. 657–666.
  28. AI as an Active Writer: Interaction Strategies with Generated Text in Human-AI Collaborative Fiction Writing 56-65. In IUI Workshops. https://api.semanticscholar.org/CorpusID:248301902
  29. Wordcraft: Story Writing With Large Language Models. (2022), 841–852. https://doi.org/10.1145/3490099.3511105
  30. Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (Pittsburgh, PA, USA) (DIS ’23). Association for Computing Machinery, New York, NY, USA, 2206–2220. https://doi.org/10.1145/3563657.3596138
  31. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages. https://doi.org/10.1145/3544548.3581388
  32. Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones. In The 34th Annual ACM Symposium on User Interface Software and Technology. 162–178.
  33. EyeSayCorrect: Eye Gaze and Voice Based Hands-Free Text Correction for Mobile Devices. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 470–482. https://doi.org/10.1145/3490099.3511103
Citations (13)

Summary

  • The paper demonstrates that a novel LLM-assisted GUI enhances speech-to-text conversion through gist extraction and macro revision.
  • The methodology uses LLM-derived keyword summarization and iterative text refinement to reduce cognitive load and improve coherence.
  • Evaluation with 12 participants shows Rambler outperforming traditional speech-to-text systems in supporting flexible, high-level writing revisions.

Analyzing "Rambler: Speech-based Long-form Writing via an LLM-Augmented GUI"

The paper "Rambler: Speech-based Long-form Writing via an LLM-Augmented GUI" addresses a crucial challenge in modern human-computer interaction: facilitating the efficient conversion of spoken language into structured written content. As the ubiquity of mobile devices continues to rise, leveraging natural language inputs, such as speech, presents a valuable opportunity to simplify interaction and input processes. However, the inherent verbosity and potential incoherence of spoken language pose significant obstacles.

Core Contributions and Methodology

Rambler is introduced as an LLM-powered graphical user interface designed to assist users in effective speech-to-text writing, capitalizing on recent advancements in LLMs. The novel interface diverges from traditional speech-to-text systems by focusing on "gist-based" manipulations—conceptual chunks that facilitate users' management of their spoken content. Rambler comprises two pivotal components: gist extraction and macro revision.

Gist Extraction

This process involves generating keywords and summaries from the raw transcriptions—these elements serve as navigational anchors, improving users' ability to review and comprehend their dictated text. By abstracting longer text into concise summaries and focusing on key ideas, Rambler reduces cognitive load, aiding users in identifying and reorganizing central concepts more effectively.

Macro Revision

Through LLM-powered tools, users can perform high-level text manipulations such as merging, splitting, and transforming text chunks without needing precise edit points. This enables users to reconceptualize their spoken input on a macro level, supporting iterative refinement that mirrors more traditional writing processes. Respeaking segmentations directly into distinct "Rambles" allows users to develop structurally coherent text output iteratively.

Evaluation and Findings

In a comparative paper involving 12 participants, Rambler was evaluated against a baseline combination of a speech-to-text editor and ChatGPT. Analysis of the user paper indicates that Rambler outperforms this baseline, particularly in supporting iterative revisions and providing enhanced user control over content management. Interestingly, participants demonstrated diverse revision strategies facilitated by Rambler's flexible, gist-centric affordances.

Implications and Future Directions

The theoretical implications of this work suggest a promising avenue for integrating LLMs more meaningfully into human-computer interaction interfaces. By prioritizing semantic content over verbatim transcription, Rambler exemplifies how AI can facilitate more natural, fluent interactions with technology. Practically, it points toward an innovative design approach for mobile writing applications, potentially reducing the barriers to starting and iterating on long-form text compositions.

Future explorations could expand upon this functionality to encompass other input modalities or develop more granular user customization capabilities. Additionally, as LLM efficiency and accuracy continue to improve, the latency issues observed within the real-time application of LLMs for dynamic text manipulation may diminish, paving the way for seamless integration in various writing contexts.

In conclusion, the development and evaluation of Rambler represent a thoughtful application of state-of-the-art AI to address a tangible usability challenge. Its emphasis on integrating conceptual-level interactions within conventional GUI frameworks marks a significant contribution toward the seamless integration of speech into our digital writing habits.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We found no open problems mentioned in this paper.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 9 tweets and received 126 likes.

Upgrade to Pro to view all of the tweets about this paper: