Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rewriting the Script: Adapting Text Instructions for Voice Interaction (2306.09992v1)

Published 16 Jun 2023 in cs.HC and cs.CL

Abstract: Voice assistants have sharply risen in popularity in recent years, but their use has been limited mostly to simple applications like music, hands-free search, or control of internet-of-things devices. What would it take for voice assistants to guide people through more complex tasks? In our work, we study the limitations of the dominant approach voice assistants take to complex task guidance: reading aloud written instructions. Using recipes as an example, we observe twelve participants cook at home with a state-of-the-art voice assistant. We learn that the current approach leads to nine challenges, including obscuring the bigger picture, overwhelming users with too much information, and failing to communicate affordances. Instructions delivered by a voice assistant are especially difficult because they cannot be skimmed as easily as written instructions. Alexa in particular did not surface crucial details to the user or answer questions well. We draw on our observations to propose eight ways in which voice assistants can ``rewrite the script'' -- summarizing, signposting, splitting, elaborating, volunteering, reordering, redistributing, and visualizing -- to transform written sources into forms that are readily communicated through spoken conversation. We conclude with a vision of how modern advancements in natural language processing can be leveraged for intelligent agents to guide users effectively through complex tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Towards More Transactional Voice Assistants: Investigating the Potential for a Multimodal Voice-Activated Indoor Navigation Assistant for Blind and Sighted Travelers. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA, 1–16. https://doi.org/10.1145/3411764.3445638
  2. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300233
  3. Music, Search, and IoT: How People (Really) Use Voice Assistants. ACM Transactions on Computer-Human Interaction 26, 3 (April 2019). https://doi.org/10.1145/3311956
  4. Hugh Beyer and Karen Holtzblatt. 1997. Contextual Design: Defining Customer-Centered Systems. Morgan Kaufmann Publishers, San Francisco, CA, USA.
  5. Lon Binder and Jessica Binder. 2009. Perfectly Cooked Spinach. https://flic.kr/p/61rqjw
  6. Pre-Recorded Instructional Audio vs. Dispatchers’ Conversational Assistance in Telephone Cardiopulmonary Resuscitation: A Randomized Controlled Simulation Study. World Journal of Emergency Medicine 9, 3 (2018), 165–171. https://doi.org/10.5847/wjem.j.1920-8642.2018.03.001
  7. Qualitative HCI Research: Going Behind the Scenes. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-031-02217-3
  8. MultiWOZ – A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 5016–5026. https://doi.org/10.18653/v1/D18-1547
  9. Vitro: Designing a Voice Assistant for the Scientific Lab Workplace. In Proceedings of the 2019 on Designing Interactive Systems Conference (DIS ’19). Association for Computing Machinery, New York, NY, USA, 1531–1542. https://doi.org/10.1145/3322276.3322298
  10. Robin: Enabling Independence For Individuals With Cognitive Disabilities Using Voice Assistive Technology. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA ’17). Association for Computing Machinery, New York, NY, USA, 46–53. https://doi.org/10.1145/3027063.3049266
  11. John M. Carroll. 1990. The Nurnberg Funnel: Designing Minimalist Instruction for Practical Computer Skill. https://mitpress.mit.edu/9780262031639/the-nurnberg-funnel/
  12. Dhivya Chandrasekaran and Vijay Mago. 2021. Evolution of Semantic Similarity—A Survey. Comput. Surveys 54, 2 (Feb. 2021), 41:1–41:37. https://doi.org/10.1145/3440755
  13. RecipeScape: An Interactive Tool for Analyzing Cooking Instructions at Scale. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3174025
  14. How to Design Voice Based Navigation for How-To Videos. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300931
  15. A Smart Kitchen for Nutrition-Aware Cooking. IEEE Pervasive Computing 9, 04 (Oct. 2010), 58–65. https://doi.org/10.1109/MPRV.2010.75
  16. What Makes a Good Conversation? Challenges in Designing Truly Conversational Agents. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300705
  17. Alana Claxton. 2019. Cooking Lessons: Oral Recipe Sharing in the Southern Kitchen. Master’s thesis. East Tennessee State University. https://dc.etsu.edu/etd/3550/
  18. Everything Happens for a Reason: Discovering the Purpose of Actions in Procedural Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 4496–4505. https://doi.org/10.18653/v1/D19-1457
  19. Gregory Druck and Bo Pang. 2012. Spice It Up? Mining Refinements to Online Instructions from User Generated Content. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Jeju Island, Korea, 545–553. https://aclanthology.org/P12-1057
  20. Angela Duckworth and Lyle Ungar. 2023. Op-Ed: Don’t Ban Chatbots in Classrooms — Use Them to Change How We Teach. Yahoo Entertainment (Jan. 2023). https://www.yahoo.com/entertainment/op-ed-dont-ban-chatbots-112037967.html
  21. The Psychology of Following Instructions and Its Implications. American Journal of Pharmaceutical Education 84, 8 (Aug. 2020). https://doi.org/10.5688/ajpe7779
  22. Elsa Eiriksdottir and Richard Catrambone. 2011. Procedural Instructions, Principles, and Examples: How to Structure Instructions for Procedural Tasks to Enhance Performance, Learning, and Transfer. Human Factors 53, 6 (Dec. 2011), 749–770. https://doi.org/10.1177/0018720811419154
  23. Iris: A Conversational Agent for Complex Tasks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3174047
  24. Summarizing Procedural Text: Data and Approach. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 2216–2225. https://aclanthology.org/2022.findings-emnlp.162
  25. Intelligent Tutoring Systems with Conversational Dialogue. AI Magazine 22, 4 (Dec. 2001), 39–39. https://doi.org/10.1609/aimag.v22i4.1591
  26. Atomized or Delayed Execution? An Alternative Paradigm for the Study of Procedural Learning. Journal of Educational Psychology 111 (2019), 1406–1415. https://doi.org/10.1037/edu0000357 Place: US Publisher: American Psychological Association.
  27. Mise en Place: Unsupervised Interpretation of Instructional Recipes. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 982–992. https://doi.org/10.18653/v1/D15-1114
  28. BiSECT: Learning to Split and Rephrase Sentences with Bitexts. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP ’21). Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 6193–6209. https://doi.org/10.18653/v1/2021.emnlp-main.500
  29. The Digital Cooking Coach: Using Visual and Auditory In-Situ Instructions to Assist Cognitively Impaired during Cooking. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments (PETRA ’19). Association for Computing Machinery, New York, NY, USA, 156–163. https://doi.org/10.1145/3316782.3321524
  30. Heuristic Evaluation of Conversational Agents. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3411764.3445312
  31. Lessons from Oz: Design Guidelines for Automotive Conversational User Interfaces. In Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications: Adjunct Proceedings (AutomotiveUI ’19). Association for Computing Machinery, New York, NY, USA, 335–340. https://doi.org/10.1145/3349263.3351314
  32. Goal-Oriented Script Construction. In Proceedings of the 14th International Conference on Natural Language Generation. Association for Computational Linguistics, Aberdeen, Scotland, UK, 184–200. https://aclanthology.org/2021.inlg-1.19
  33. George A. Miller. 1956. The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review 63 (1956), 81–97. https://doi.org/10.1037/h0043158 Place: US Publisher: American Psychological Association.
  34. Lauren Miyashiro. 2023. Tuscan Butter Salmon. https://www.delish.com/cooking/recipe-ideas/recipes/a58412/tuscan-butter-salmon-recipe/ Section: Recipes.
  35. Christine Murad and Cosmin Munteanu. 2019. ”I Don’t Know What You’re Talking About, HALexa”: The Case for Voice User Interface Guidelines. In Proceedings of the 1st International Conference on Conversational User Interfaces (CUI ’19). Association for Computing Machinery, New York, NY, USA, 1–3. https://doi.org/10.1145/3342775.3342795
  36. Design Guidelines for Hands-Free Speech Interaction. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (MobileHCI ’18). Association for Computing Machinery, New York, NY, USA, 269–276. https://doi.org/10.1145/3236112.3236149
  37. Finding a New Voice: Transitioning Designers from GUI to VUI Design. In Proceedings of the 3rd Conference on Conversational User Interfaces (CUI ’21). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3469595.3469617
  38. Neal R. Norrick. 2011. Conversational Recipe Telling. Journal of Pragmatics 43, 11 (Sept. 2011), 2740–2761. https://doi.org/10.1016/j.pragma.2011.04.010
  39. Supporting Complex Tasks Using Multiple Devices. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. Association for Computing Machinery. https://www.microsoft.com/en-us/research/publication/supporting-complex-tasks-using-multiple-devices/
  40. OpenAI. 2022. ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt/ Publication Title: OpenAI.
  41. QuakerBot: A Household Dialog System Powered by Large Language Models. In Alexa Prize TaskBot Challenge Proceedings. https://www.amazon.science/alexa-prize/proceedings/quakerbot-a-household-dialog-system-powered-by-large-language-models
  42. Voice Interfaces in Everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3174214
  43. Zero-Shot Text-to-Image Generation. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research), Marina Meila and Tong Zhang (Eds.), Vol. 139. PMLR, 8821–8831. https://proceedings.mlr.press/v139/ramesh21a.html
  44. A Recipe for Arbitrary Text Style Transfer with Large Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (ACL ’22). Association for Computational Linguistics, Dublin, Ireland, 837–848. https://doi.org/10.18653/v1/2022.acl-short.94
  45. High-Resolution Image Synthesis With Latent Diffusion Models (CVPR ’22). 10684–10695. https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html
  46. MimiCook: A Cooking Assistant System with Situated Guidance. In Proceedings of the 8th International Conference on Tangible, Embedded and Embodied Interaction (TEI ’14). Association for Computing Machinery, New York, NY, USA, 121–124. https://doi.org/10.1145/2540930.2540952
  47. Drill Sergeant: Supporting Physical Construction Projects through an Ecosystem of Augmented Tools. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA ’16). Association for Computing Machinery, New York, NY, USA, 1607–1614. https://doi.org/10.1145/2851581.2892429
  48. ”Hey Alexa, What’s Up?”: A Mixed-Methods Studies of In-Home Conversational Agent Usage. In Proceedings of the 2018 Designing Interactive Systems Conference (DIS ’18). Association for Computing Machinery, New York, NY, USA, 857–868. https://doi.org/10.1145/3196709.3196772
  49. Voicepedia: Towards Speech-Based Access to Unstructured Information. ISCA, 146–149. https://www.semanticscholar.org/paper/Voicepedia%3A-towards-speech-based-access-to-Sherwani-Yu/dd6f19b38072b2dad6633bd31879b7f9e7138dcd
  50. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI ’17). AAAI Press, San Francisco, California, USA, 4444–4451.
  51. Multimedia Instructions and Cognitive Load Theory: Effects of Modality and Cueing. The British Journal of Educational Psychology 74, Pt 1 (March 2004), 71–81. https://doi.org/10.1348/000709904322848824
  52. Competence-based Question Generation. In Proceedings of the 29th International Conference on Computational Linguistics (COLING ’22). International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1521–1533. https://aclanthology.org/2022.coling-1.131
  53. Alexandra Vtyurina and Adam Fourney. 2018. Exploring the Role of Conversational Cues in Guided Task Support with Virtual Assistants. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3173574.3173782
  54. Eliciting and Analysing Users’ Envisioned Dialogues with Perfect Voice Assistants. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ’21). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3411764.3445536
  55. Alexa, Can You Help Us Solve This Problem? How Conversations With Smart Personal Assistant Tutors Increase Task Group Outcomes. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3290607.3313090
  56. TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 917–929. https://doi.org/10.18653/v1/2020.emnlp-main.66
  57. Induce, Edit, Retrieve: Language Grounded Multimodal Schema for Instructional Video Retrieval. Computing Research Repository abs/2111.09276 (2021). https://arxiv.org/abs/2111.09276 arXiv: 2111.09276.
  58. Visual Goal-Step Inference using WikiHow. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP ’21). Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2167–2179. https://doi.org/10.18653/v1/2021.emnlp-main.165
  59. Nicole Yankelovich and Eric Baatz. 1994. SpeechActs: A Framework for Building Speech Applications. In AVIOS ’94 Conference Proceedings. Sun Microsystems Laboratories, Inc., 20–23. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=e785ab9af208afcafe9ef1a222495df745033155
  60. Designing SpeechActs: Issues in Speech User Interfaces. In Proceedings of the 1995 CHI Conference on Human Factors in Computing Systems (CHI ’95). 369–376.
  61. Reasoning about Goals, Steps, and Temporal Ordering with WikiHow. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (EMNLP ’20). Association for Computational Linguistics, Online, 4630–4639. https://doi.org/10.18653/v1/2020.emnlp-main.374
  62. Small but Mighty: New Benchmarks for Split and Rephrase. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (EMNLP ’20). Association for Computational Linguistics, Online, 1198–1205. https://doi.org/10.18653/v1/2020.emnlp-main.91
  63. Automatic Chain of Thought Prompting in Large Language Models. https://doi.org/10.48550/arXiv.2210.03493 arXiv:2210.03493 [cs].
  64. “Rewind to the Jiggling Meat Part”: Understanding Voice Control of Instructional Videos in Everyday Tasks. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3491102.3502036
  65. Unsupervised Multi-Granularity Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2022 (Findings ’22). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 4980–4995. https://aclanthology.org/2022.findings-emnlp.366
  66. Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL ’22). Association for Computational Linguistics, Dublin, Ireland, 2998–3012. https://doi.org/10.18653/v1/2022.acl-long.214
Citations (9)

Summary

We haven't generated a summary for this paper yet.