Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ReviewFlow: Intelligent Scaffolding to Support Academic Peer Reviewing (2402.03530v2)

Published 5 Feb 2024 in cs.HC

Abstract: Peer review is a cornerstone of science. Research communities conduct peer reviews to assess contributions and to improve the overall quality of science work. Every year, new community members are recruited as peer reviewers for the first time. How could technology help novices adhere to their community's practices and standards for peer reviewing? To better understand peer review practices and challenges, we conducted a formative study with 10 novices and 10 experts. We found that many experts adopt a workflow of annotating, note-taking, and synthesizing notes into well-justified reviews that align with community standards. Novices lack timely guidance on how to read and assess submissions and how to structure paper reviews. To support the peer review process, we developed ReviewFlow -- an AI-driven workflow that scaffolds novices with contextual reflections to critique and annotate submissions, in-situ knowledge support to assess novelty, and notes-to-outline synthesis to help align peer reviews with community expectations. In a within-subjects experiment, 16 inexperienced reviewers wrote reviews in two conditions: using ReviewFlow and using a baseline environment with minimal guidance. With ReviewFlow, participants produced more comprehensive reviews, identifying more pros and cons. While participants appreciated the streamlined process support from ReviewFlow, they also expressed concerns about using AI as part of the scientific review process. We discuss the implications of using AI to scaffold the peer review process on scientific work and beyond.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (99)
  1. 2015. Investigating the Quality of Reviews, Reviewers, and their Expertise for CHI2023. https://chi2023.acm.org/2023/01/05/investigating-the-quality-of-reviews-reviewers-and-their-expertise-for-chi2023/
  2. 2015. Semantic Scholar. https://www.semanticscholar.org/
  3. Peer grading the peer reviews: a dual-role approach for lightening the scholarly paper review process. In Proceedings of the Web Conference 2021. 1916–1927.
  4. Learning from examples: Instructional principles from the worked examples research. Review of educational research 70, 2 (2000), 181–214.
  5. Paper plain: Making medical research papers approachable to healthcare consumers with natural language processing. ACM Transactions on Computer-Human Interaction (2022).
  6. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023 (2023).
  7. Your paper has been accepted, rejected, or whatever: Automatic generation of scientific paper reviews. In International conference on availability, reliability, and security. Springer, 19–28.
  8. Interacting with Next-Phrase Suggestions: How Suggestion Systems Aid and Influence the Cognitive Processes of Writing. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 436–452.
  9. Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.
  10. To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–21.
  11. Dung C Bui and Mark A McDaniel. 2015. Enhancing learning during lecture note-taking using outlines and illustrative diagrams. Journal of Applied Research in Memory and Cognition 4, 2 (2015), 129–135.
  12. CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–15.
  13. Davida H Charney and Richard A Carlson. 1995. Learning to write in a genre: What student writers take from model texts. Research in the Teaching of English (1995), 88–125.
  14. Marvista: A Human-AI Collaborative Reading Tool. arXiv preprint arXiv:2207.08401 (2022).
  15. APE: argument pair extraction from peer review and rebuttal via multi-task learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 7000–7011.
  16. TaleBrush: Sketching stories with generative pretrained language models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.
  17. Creative writing with a machine in the loop: Case studies on slogans and stories. In 23rd International Conference on Intelligent User Interfaces. 329–340.
  18. Allan Collins. 2006. Cognitive apprenticeship: The cambridge handbook of the learning sciences, R. Keith Sawyer.
  19. Sara Doan. 2021. Teaching workplace genre ecologies and pedagogical goals through résumés and cover letters. Business and Professional Communication Quarterly 84, 4 (2021), 294–317.
  20. Peter Facione. 1990. Critical thinking: A statement of expert consensus for purposes of educational assessment and instruction (The Delphi Report). (1990).
  21. Linda Flower and John R Hayes. 1981. A cognitive process theory of writing. College composition and communication 32, 4 (1981), 365–387.
  22. Scim: Intelligent Skimming Support for Scientific Papers. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 476–490.
  23. Raymond Fok and Daniel S Weld. 2023. What Can’t Large Language Models Do? The Future of AI-Assisted Academic Writing. In In2Writing Workshop at CHI.
  24. Does my rebuttal matter? insights from a major nlp conference. arXiv preprint arXiv:1903.11367 (2019).
  25. Design: cultural probes. interactions 6, 1 (1999), 21–29.
  26. A design space for writing support tools using a cognitive process model of writing. In Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022). 11–24.
  27. Katy Ilonka Gero and Lydia B Chilton. 2019a. How a Stylistic, Machine-Generated Thesaurus Impacts a Writer’s Process. In Proceedings of the 2019 on Creativity and Cognition. 597–603.
  28. Katy Ilonka Gero and Lydia B Chilton. 2019b. Metaphoria: An algorithmic companion for metaphor creation. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–12.
  29. Sparks: Inspiration for Science Writing using Language Models. arXiv. http://arxiv.org/abs/2110.07640 arXiv:2110.07640 [cs].
  30. What Else Do I Need to Know? The Effect of Background Information on Users’ Reliance on AI Systems. arXiv preprint arXiv:2305.14331 (2023).
  31. Open learning environments: Foundations, methods, and models. Instructional-design theories and models: A new paradigm of instructional theory 2 (1999), 115–140.
  32. Tony Harland. 2003. Vygotsky’s zone of proximal development and problem-based learning: Linking a theoretical concept with practice through action research. Teaching in higher education 8, 2 (2003), 263–272.
  33. Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139–183.
  34. John R Hayes. 2012. Modeling and remodeling writing. Written communication 29, 3 (2012), 369–388.
  35. Derek Holton and David Clarke. 2006. Scaffolding and metacognition. International journal of mathematical education in science and technology 37, 2 (2006), 127–143.
  36. Argument mining for understanding peer reviews. arXiv preprint arXiv:1903.10104 (2019).
  37. Reducing sentiment bias in language models via counterfactual evaluation. arXiv preprint arXiv:1911.03064 (2019).
  38. Julie Hui and Michelle L Sprouse. 2023. Lettersmith: Scaffolding Written Professional Communication Among College Students. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17.
  39. Introassist: A tool to support writing introductory help requests. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.
  40. Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers. arXiv preprint arXiv:2211.05030 (2022).
  41. Co-writing with opinionated language models affects users’ views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–15.
  42. Effects of editorial peer review: a systematic review. Jama 287, 21 (2002), 2784–2786.
  43. Threddy: An Interactive System for Personalized Thread-based Exploration and Organization of Scientific Literature. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–15.
  44. Holtzblatt Karen and Jones Sandra. 2017. Contextual inquiry: A participatory technique for system design. In Participatory design. CRC Press, 177–210.
  45. “Why Do I Care What’s Similar?” Probing Challenges in AI-Assisted Child Welfare Decision-Making through Worker-AI Interface Design Concepts. In Designing Interactive Systems Conference. 454–470.
  46. Motif: Supporting novice creativity through expert patterns. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 1211–1220.
  47. Critique style guide: Improving crowdsourced design feedback with a natural language model. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 4627–4639.
  48. Klaus Krippendorff. 2018. Content analysis: An introduction to its methodology. Sage publications.
  49. Fuse: In-Situ Sensemaking Support in the Browser. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–15.
  50. Emily R Lai. 2011. Critical thinking: A literature review. Pearson’s Research Reports 6, 1 (2011), 40–41.
  51. Himabindu Lakkaraju and Osbert Bastani. 2020. ” How do I fool you?” Manipulating User Trust via Misleading Black Box Explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 79–85.
  52. John Langford and Mark Guzdial. 2015. The arbitrariness of reviews, and advice for school administrators. Commun. ACM 58, 4 (2015), 12–13.
  53. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–19. https://doi.org/10.1145/3491102.3502030
  54. Can large language models provide useful feedback on research papers? A large-scale empirical analysis. arXiv:2310.01783 [cs.LG]
  55. Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation. arXiv preprint arXiv:2401.10838 (2024).
  56. Selenite: Scaffolding decision making with comprehensive overviews elicited from large language models. arXiv preprint arXiv:2310.02161 (2023).
  57. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
  58. The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces. arXiv preprint arXiv:2303.14334 (2023).
  59. Alison McCook. 2006. Is peer review broken? Submissions are up, reviewers are overtaxed, and authors are lodging complaint after complaint about the process at top-tier journals. What’s wrong with peer review? The scientist 20, 2 (2006), 26–35.
  60. Patrick E McKnight and Julius Najab. 2010. Mann-Whitney U Test. The Corsini encyclopedia of psychology (2010), 1–1.
  61. Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals. arXiv. http://arxiv.org/abs/2209.14958 arXiv:2209.14958 [cs].
  62. Jeffrey C Mogul. 2013. Towards more constructive reviewing of SIGCOMM papers. , 90–94 pages.
  63. Tim Moore. 2013. Critical thinking: Seven definitions in search of a concept. Studies in Higher Education 38, 4 (2013), 506–522.
  64. John C Nesbit and Olusola O Adesope. 2013. Concept maps for learning. Learning through visual displays. Charlotte, NC: Information Age Publishing (2013), 303–328.
  65. Towards explainable AI: Assessing the usefulness and impact of added explainability features in legal document summarization. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.
  66. Instructional feedback: An effective, efficient, low-intensity strategy to support student success. Beyond Behavior 27, 3 (2018), 168–174.
  67. Relatedly: Scaffolding Literature Reviews with Existing Related Work Sections. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
  68. The distributed peer review experiment. The Messenger 177 (2019), 3–13.
  69. CReBot: Exploring interactive question prompts for critical paper reading. International Journal of Human-Computer Studies 167 (2022), 102898.
  70. Manipulating and measuring model interpretability. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–52.
  71. Simon Price and Peter A Flach. 2017. Computational support for academic peer review: A perspective from artificial intelligence. Commun. ACM 60, 3 (2017), 70–79.
  72. Citeread: Integrating localized citation contexts into scientific paper reading. In 27th International Conference on Intelligent User Interfaces. 707–719.
  73. ForSense: Accelerating Online Research Through Sensemaking Integration and Machine Research Support. In 26th International Conference on Intelligent User Interfaces. 608–618.
  74. MixTAPE: Mixed-initiative Team Action Plan Creation Through Semi-structured Notes, Automatic Task Generation, and Task Classification. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2, 1–26. https://doi.org/10.1145/3415240
  75. Brian J Reiser. 2004. Scaffolding complex learning: The mechanisms of structuring and problematizing student work. The Journal of the Learning sciences 13, 3 (2004), 273–304.
  76. Grant peer review: improving inter-rater reliability with training. PloS one 10, 6 (2015), e0130450.
  77. John W Saye and Thomas Brush. 2002. Scaffolding critical reasoning about history and social issues in multimedia-supported learning environments. Educational Technology Research and Development 50, 3 (2002), 77–96.
  78. Nihar B Shah. 2022. An overview of challenges, experiments, and computational solutions in peer review (extended version). Commun. ACM (2022).
  79. Design and analysis of the NIPS 2016 review process. Journal of machine learning research (2018).
  80. Richard Smith. 2006. Peer review: a flawed process at the heart of science and journals. Journal of the royal society of medicine 99, 4 (2006), 178–182.
  81. PeerReview4All: Fair and accurate reviewer assignment in peer review. In Algorithmic Learning Theory. PMLR, 828–856.
  82. A novice-reviewer experiment to address scarcity of qualified reviewers in large conferences. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4785–4793.
  83. Prior and prejudice: The novice reviewers’ bias against resubmissions in conference peer review. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–17.
  84. Lev Semenovich Vygotsky and Michael Cole. 1978. Mind in society: Development of higher psychological processes. Harvard university press.
  85. ArgueTutor: An adaptive dialog-based learning system for argumentation skills. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–13.
  86. AL: An adaptive learning support system for argumentation skills. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–14.
  87. DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. arXiv preprint arXiv:2306.11698 (2023).
  88. ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis. http://arxiv.org/abs/2010.06119 arXiv:2010.06119 [cs].
  89. Scaffolding critical thinking in the zone of proximal development. Higher Education Research & Development 30, 3 (2011), 317–328.
  90. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  91. Wenting Xiong and Diane Litman. 2011. Automatically Predicting Peer-Review Helpfulness. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, 502–507. https://aclanthology.org/P11-2088
  92. Doc: Improving long story coherence with detailed outline control. arXiv preprint arXiv:2212.10077 (2022).
  93. Re3: Generating longer stories with recursive reprompting and revision. arXiv preprint arXiv:2210.06774 (2022).
  94. Enhancing self-efficacy through scaffolding. Proceedings from FLLT (2013).
  95. Wordcraft: Story Writing With Large Language Models. In 27th International Conference on Intelligent User Interfaces. 841–852.
  96. Almost an expert: The effects of rubrics and expertise on perceived value of crowdsourced design critiques. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing. 1005–1017.
  97. CriTrainer: An Adaptive Training Tool for Critical Paper Reading. (2023).
  98. Can we automate scientific reviewing? arXiv preprint arXiv:2102.00176 (2021).
  99. VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping. arXiv preprint arXiv:2304.07810 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Lu Sun (25 papers)
  2. Aaron Chan (44 papers)
  3. Yun Seo Chang (1 paper)
  4. Steven P. Dow (4 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com