Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation (2401.10838v2)
Abstract: Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneous spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies.
- Generative Models can Help Writers without Writing for Them.. In IUI Workshops.
- From Tool to Companion: Storywriters Want AI Writers to Respect Their Personal Values and Writing Strategies. Proceedings of the 2022 ACM Designing Interactive Systems Conference (2022). https://api.semanticscholar.org/CorpusID:249578888
- Service blueprinting: a practical technique for service innovation. California management review 50, 3 (2008), 66–94.
- Beyond Text Generation: Supporting Writers with Continuous Automatic Text Summaries. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 98, 13 pages. https://doi.org/10.1145/3526113.3545672
- Just Speak It: Minimize Cognitive Load for Eyes-Free Text Editing with a Smart Voice Assistant. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 910–921. https://doi.org/10.1145/3472749.3474795
- Sparks: Inspiration for Science Writing Using Language Models. In Proceedings of the 2022 ACM Designing Interactive Systems Conference (Virtual Event, Australia) (DIS ’22). Association for Computing Machinery, New York, NY, USA, 1002–1019. https://doi.org/10.1145/3532106.3533533
- Commanding and Re-Dictation: Developing Eyes-Free Voice-Based Interaction for Editing Dictated Text. ACM Trans. Comput.-Hum. Interact. 27, 4, Article 28 (aug 2020), 31 pages. https://doi.org/10.1145/3390889
- Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers. ArXiv abs/2211.05030 (2022). https://api.semanticscholar.org/CorpusID:253420678
- Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. arXiv preprint arXiv:2305.11473 (2023).
- Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 568–575. https://doi.org/10.1145/302979.303160
- Toward Interactive Dictation. arXiv:2307.04008 [cs.CL]
- Hierarchical Summarization for Longform Spoken Dialog. In The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 582–597. https://doi.org/10.1145/3472749.3474771
- Improving Automatic Summarization for Browsing Longform Spoken Dialog. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
- Improving readability for automatic speech recognition transcription. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 5 (2023), 1–23.
- Typist Experiment: An Investigation of Human-to-Human Dictation via Role-Play to Inform Voice-Based Text Authoring. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 338 (nov 2022), 33 pages. https://doi.org/10.1145/3555758
- Paria Jamshid Lou and Mark Johnson. 2020. End-to-End Speech Recognition and Disfluency Removal. In Findings of the Association for Computational Linguistics: EMNLP 2020. 2051–2061.
- Gist and Verbatim: Understanding Speech to Inform New Interfaces for Verbal Text Composition. In Proceedings of the 5th International Conference on Conversational User Interfaces (Eindhoven, Netherlands) (CUI ’23). Association for Computing Machinery, New York, NY, USA, Article 15, 11 pages. https://doi.org/10.1145/3571884.3597134
- Bill Moggridge. 2006. Designing Interactions. The MIT Press.
- Revolution or Evolution? Speech Interaction and HCI Design Guidelines. IEEE Pervasive Computing 18, 2 (2019), 33–45. https://doi.org/10.1109/MPRV.2019.2906991
- Ken Perlin and David Fox. 1993. Pad: An Alternative Approach to the Computer Interface. In Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques (Anaheim, CA) (SIGGRAPH ’93). Association for Computing Machinery, New York, NY, USA, 57–64. https://doi.org/10.1145/166117.166125
- ChatGPT and Academic Research: A Review and Recommendations Based on Practical Examples. Journal of Education, Management and Development Studies (2023). https://api.semanticscholar.org/CorpusID:257845986
- Automatic Keyword Extraction from Individual Documents. John Wiley & Sons, Ltd, Chapter 1, 1–20. https://doi.org/10.1002/9780470689646.ch1 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470689646.ch1
- Comparing Speech and Keyboard Text Entry for Short Messages in Two Languages on Touchscreen Phones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4, Article 159 (jan 2018), 23 pages. https://doi.org/10.1145/3161187
- Digital support for academic writing: A review of technologies and pedagogies. Computers & education 131 (2019), 33–48.
- Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models. arXiv preprint arXiv:2305.11483 (2023).
- Neural Error Corrective Language Models for Automatic Speech Recognition.. In INTERSPEECH. 401–405.
- Daniel Vogel and Patrick Baudisch. 2007. Shift: a technique for operating pen-based interfaces using touch. In Proceedings of the SIGCHI conference on Human factors in computing systems. 657–666.
- AI as an Active Writer: Interaction Strategies with Generated Text in Human-AI Collaborative Fiction Writing 56-65. In IUI Workshops. https://api.semanticscholar.org/CorpusID:248301902
- Wordcraft: Story Writing With Large Language Models. (2022), 841–852. https://doi.org/10.1145/3490099.3511105
- Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (Pittsburgh, PA, USA) (DIS ’23). Association for Computing Machinery, New York, NY, USA, 2206–2220. https://doi.org/10.1145/3563657.3596138
- Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 437, 21 pages. https://doi.org/10.1145/3544548.3581388
- Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones. In The 34th Annual ACM Symposium on User Interface Software and Technology. 162–178.
- EyeSayCorrect: Eye Gaze and Voice Based Hands-Free Text Correction for Mobile Devices. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 470–482. https://doi.org/10.1145/3490099.3511103
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.