Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models (2402.11723v1)
Abstract: Advances in LLMing have paved the way for novel human-AI co-writing experiences. This paper explores how varying levels of scaffolding from LLMs shape the co-writing process. Employing a within-subjects field experiment with a Latin square design, we asked participants (N=131) to respond to argumentative writing prompts under three randomly sequenced conditions: no AI assistance (control), next-sentence suggestions (low scaffolding), and next-paragraph suggestions (high scaffolding). Our findings reveal a U-shaped impact of scaffolding on writing quality and productivity (words/time). While low scaffolding did not significantly improve writing quality or productivity, high scaffolding led to significant improvements, especially benefiting non-regular writers and less tech-savvy users. No significant cognitive burden was observed while using the scaffolded writing tools, but a moderate decrease in text ownership and satisfaction was noted. Our results have broad implications for the design of AI-powered writing tools, including the need for personalized scaffolding mechanisms.
- Power to the people: The role of humans in interactive machine learning. Ai Magazine 35, 4 (2014), 105–120.
- Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–13.
- On Suggesting Phrases vs. Predicting Words for Mobile Text Composition. 603–608. https://doi.org/10.1145/2984511.2984584
- A Sociocultural Perspective on Second Language Acquisition: The Effect of High-structured Scaffolding versus Low-structured Scaffolding on the Writing Ability of EFL Learners. Reflections on English Language Teaching 10 (Jan. 2011), 43–54.
- The Limits of Expert Text Entry Speed on Mobile Keyboards with Autocorrect. In Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services (Taipei, Taiwan) (MobileHCI ’19). Association for Computing Machinery, New York, NY, USA, Article 15, 12 pages. https://doi.org/10.1145/3338286.3340126
- Yoav Benjamini and Yosef Hochberg. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57, 1 (1995), 289–300.
- Interacting with Next-Phrase Suggestions: How Suggestion Systems Aid and Influence the Cognitive Processes of Writing. In Proceedings of the 28th International Conference on Intelligent User Interfaces. ACM. https://doi.org/10.1145/3581641.3584060
- Both Complete and Correct? Multi-Objective Optimization of Touchscreen Keyboard. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2014). New York, NY, USA, 2297–2306.
- Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
- The impact of multiple parallel phrase suggestions on email input and composition behaviour of native and non-native english writers. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–13.
- The Impact of Multiple Parallel Phrase Suggestions on Email Input and Composition Behaviour of Native and Non-Native English Writers. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 732, 13 pages. https://doi.org/10.1145/3411764.3445372
- Are referees and editors in economics gender neutral? The Quarterly Journal of Economics 135, 1 (2020), 269–327.
- MERMAID: Metaphor Generation with Symbolism and Discriminative Decoding. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 4250–4261. https://doi.org/10.18653/v1/2021.naacl-main.336
- Sentiment-Controllable Chinese Poetry Generation.. In IJCAI. 4925–4931.
- Gmail Smart Compose: Real-Time Assisted Writing. CoRR abs/1906.00080 (2019). arXiv:1906.00080 http://arxiv.org/abs/1906.00080
- Mapping the design space of human-ai interaction in text summarization. arXiv preprint arXiv:2206.14863 (2022).
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
- TaleBrush: Sketching Stories with Generative Pretrained Language Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 209, 19 pages. https://doi.org/10.1145/3491102.3501819
- Creative Writing with a Machine in the Loop: Case Studies on Slogans and Stories. In 23rd International Conference on Intelligent User Interfaces (Tokyo, Japan) (IUI ’18). Association for Computing Machinery, New York, NY, USA, 329–340. https://doi.org/10.1145/3172944.3172983
- Does Prediction Really Help in Marathi Text Input? Empirical Analysis of a Longitudinal Study. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services (Florence, Italy) (MobileHCI ’16). Association for Computing Machinery, New York, NY, USA, 35–46. https://doi.org/10.1145/2935334.2935366
- Almost human: Anthropomorphism increases trust resilience in cognitive agents. Journal of Experimental Psychology: Applied 22, 3 (2016), 331.
- Minimum Description Length Penalization for Group and Multi-Task Sparse Learning. Journal of Machine Learning Research 12, 16 (2011), 525–564. http://jmlr.org/papers/v12/dhillon11a.html
- Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA. Plos one 18, 3 (2023), e0279720.
- Cognitive Challenges in Human–Artificial Intelligence Collaboration: Investigating the Path Toward Productive Delegation. Info. Sys. Research 33, 2 (jun 2022), 678–696. https://doi.org/10.1287/isre.2021.1079
- InkWell: A Creative Writer’s Creative Assistant. In Proceedings of the 2015 ACM SIGCHI Conference on Creativity and Cognition (Glasgow, United Kingdom) (C&C ’15). Association for Computing Machinery, New York, NY, USA, 93–102. https://doi.org/10.1145/2757226.2757229
- Katy Ilonka Gero and Lydia B. Chilton. 2019a. How a Stylistic, Machine-Generated Thesaurus Impacts a Writer’s Process. In Proceedings of the 2019 Conference on Creativity and Cognition (San Diego, CA, USA) (C&C ’19). Association for Computing Machinery, New York, NY, USA, 597–603. https://doi.org/10.1145/3325480.3326573
- Katy Ilonka Gero and Lydia B Chilton. 2019b. How a Stylistic, Machine-Generated Thesaurus Impacts a Writer’s Process. In Proceedings of the 2019 on Creativity and Cognition. 597–603.
- Katy Ilonka Gero and Lydia B. Chilton. 2019c. Metaphoria: An Algorithmic Companion for Metaphor Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300526
- Katy Ilonka Gero and Lydia B Chilton. 2019d. Metaphoria: An algorithmic companion for metaphor creation. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–12.
- Sparks: Inspiration for Science Writing using Language Models. arXiv:2110.07640 [cs.HC]
- Parastou Gholami Pasand and Abdorreza Tahriri. 2017. Peer Scaffolding in an EFL Writing Classroom: An Investigation of Writing Accuracy and Scaffolding Behaviors. Research in English Language Pedagogy 5, 2 (Sept. 2017), 147–166.
- Iona Gilburt. 2023. A machine in the loop: the peculiar intervention of artificial intelligence in writer’s block. New Writing 0, 0 (2023), 1–12. https://doi.org/10.1080/14790726.2023.2223176 arXiv:https://doi.org/10.1080/14790726.2023.2223176
- Alireza Memari Hanjani. 2019. Collective Peer Scaffolding, Self-Revision, and Writing Progress of Novice EFL Learners. International Journal of English Studies 19, 1 (June 2019), 41–57. https://doi.org/10.6018/ijes.331771
- Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload, Peter A. Hancock and Najmedin Meshkati (Eds.). Advances in Psychology, Vol. 52. North-Holland, 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9
- Common concerns with MTurk as a participant pool: Evidence and solutions. In Handbook of research methods in consumer psychology. Routledge, 319–337.
- IntroAssist: A Tool to Support Writing Introductory Help Requests. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3173596
- Co-Writing with Opinionated Language Models Affects Users’ Views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 111, 15 pages. https://doi.org/10.1145/3544548.3581196
- Smart Reply: Automated Response Suggestion for Email. CoRR abs/1606.04870 (2016). arXiv:1606.04870 http://arxiv.org/abs/1606.04870
- The Impact of Mediated Learning on the Academic Writing Performance of Medical Students in Flipped and Traditional Classrooms: Scaffolding Techniques. Research and Practice in Technology Enhanced Learning 16, 1 (June 2021), 17. https://doi.org/10.1186/s41039-021-00165-9
- Why Are We Like This?: The AI Architecture of a Co-Creative Storytelling Game. In Proceedings of the 15th International Conference on the Foundations of Digital Games (Bugibba, Malta) (FDG ’20). Association for Computing Machinery, New York, NY, USA, Article 13, 4 pages. https://doi.org/10.1145/3402942.3402953
- CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. In CHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3491102.3502030
- Ewa Luger and Abigail Sellen. 2016. ” Like Having a Really Bad PA” The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI conference on human factors in computing systems. 5286–5297.
- Andrea A Lunsford and John J Ruszkiewicz. 2016. Everything’s an Argument. Bedford/St. Martin’s.
- Steven F Maier and Martin EP Seligman. 2016. Learned helplessness at fifty: Insights from neuroscience. Psychological review 123, 4 (2016), 349.
- Inc. Meta Platforms. 2023. Lexical: An Extensible Text Editor Framework. GitHub repository. https://github.com/facebook/lexical Accessed: 2023-09-06.
- Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry Professionals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 355, 34 pages. https://doi.org/10.1145/3544548.3581225
- Application of Artificial Intelligence powered digital writing assistant in higher education: randomized controlled trial. Heliyon 7, 5 (May 2021), e07014. https://doi.org/10.1016/j.heliyon.2021.e07014
- The Learning Network. 2021. 300 Questions and Images to Inspire Argument Writing. The New York Times (Feb. 2021).
- The IKEA effect: When labor leads to love. Journal of consumer psychology 22, 3 (2012), 453–460.
- Salameh F. Obeiah and Ruba Fahmi Bataineh. 2016. The Effect of Portfolio-Based Assessment on Jordanian EFL Learners’ Writing Performance. Bellaterra Journal of Teaching & Learning Language & Literature 9, 1 (Jan. 2016), 32–46. https://doi.org/10.5565/rev/jtl3.629
- Roy D Pea. 2018. The social and technological dimensions of scaffolding and related theoretical concepts for learning, education, and human activity. In Scaffolding. Psychology Press, 423–451.
- Exploring the Effects of Technological Writing Assistance for Support Providers in Online Mental Health Community. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–15. https://doi.org/10.1145/3313831.3376695
- Exploring the effects of technological writing assistance for support providers in online mental health community. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–15.
- James L. Peterson. 1980. Computer Programs for Detecting and Correcting Spelling Errors. Commun. ACM 23, 12 (dec 1980), 676–687. https://doi.org/10.1145/359038.359041
- AngleKindling: Supporting Journalistic Angle Ideation with Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 225, 16 pages. https://doi.org/10.1145/3544548.3580907
- The origins of intelligence in children. Vol. 8. International Universities Press New York.
- Philip Quinn and Shumin Zhai. 2016. A Cost-Benefit Study of Text Entry Suggestion Interaction. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 83–88. https://doi.org/10.1145/2858036.2858305
- Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446 (2021).
- John TE Richardson. 2018. The use of Latin-square designs in educational and psychological research. Educational Research Review 24 (2018), 84–97.
- Get your vitamin C! robust fact verification with contrastive evidence. arXiv preprint arXiv:2103.08541 (2021).
- Educational Testing Service. 2023. TOEFL iBT Writing Rubrics. https://www.ets.org/pdfs/toefl/toefl-ibt-writing-rubrics.pdf Accessed: 2023-09-06.
- Where to Hide a Stolen Elephant: Leaps in Creative Writing with Multimodal Machine Intelligence. ACM Trans. Comput.-Hum. Interact. (feb 2022). https://doi.org/10.1145/3511599 Just Accepted.
- Ingibergur Sindri Stefnisson and David Thue. 2018. Mimisbrunnur: AI-Assisted Authoring for Interactive Storytelling. In Proceedings of the Fourteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (Edmonton, Alberta, Canada) (AIIDE’18). AAAI Press, Article 34, 7 pages.
- Neomy Storch. 2005. Collaborative writing: Product, process, and students’ reflections. Journal of second language writing 14, 3 (2005), 153–173.
- Eric Sulindra. 2018. Scaffolding as Used in the Writing Process of an English Writing Course. Scaffolding as used in the writing process of an english writing course 2, 1 (2018), 74–87.
- Wahyu Kyestiati Sumarno. 2019. Effects of Edmodo-Assisted Process Writing with the Problematized Scaffolding on the Quality of Students’ Writing. Lingua Cultura 13, 1 (Feb. 2019), 31–37. https://doi.org/10.21512/lc.v13i1.5028
- Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.
- Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks. arXiv preprint arXiv:2306.07899 (2023).
- The development of higher psychological processes.
- Language Models are Few-shot Multilingual Learners. arXiv:2109.07684 [cs.CL]
- The role of tutoring in problem solving. Journal of child psychology and psychiatry 17, 2 (1976), 89–100.
- AI as an Active Writer: Interaction strategies with generated text in human-AI collaborative fiction writing. In Joint Proceedings of the IUI 2022 Workshops (CEUR Workshop Proceedings), Alison Smith-Renner and Ofra Amir (Eds.). CEUR-WS Team, 56–65. http://ceur-ws.org/Vol-3124/ Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).; Joint International Conference on Intelligent User Interfaces Workshops: APEx-UI, HAI-GEN, HEALTHI, HUMANIZE, TExSS, SOCIALIZE (IUI-WS 2022) ; Conference date: 21-03-2022 Through 22-03-2022.
- Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–12.
- Wordcraft: story writing with large language models. In 27th International Conference on Intelligent User Interfaces. 841–852.
- Type, Then Correct: Intelligent Text Correction Techniques for Mobile Text Entry Using Neural Networks. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (New Orleans, LA, USA) (UIST ’19). Association for Computing Machinery, New York, NY, USA, 843–855. https://doi.org/10.1145/3332165.3347924
- Paramveer S. Dhillon (8 papers)
- Somayeh Molaei (2 papers)
- Jiaqi Li (142 papers)
- Maximilian Golub (6 papers)
- Shaochun Zheng (2 papers)
- Lionel P. Robert (6 papers)