Prompting LLMs to Compose Meta-Review Drafts from Peer-Review Narratives of Scholarly Manuscripts (2402.15589v1)
Abstract: One of the most important yet onerous tasks in the academic peer-reviewing process is composing meta-reviews, which involves understanding the core contributions, strengths, and weaknesses of a scholarly manuscript based on peer-review narratives from multiple experts and then summarizing those multiple experts' perspectives into a concise holistic overview. Given the latest major developments in generative AI, especially LLMs, it is very compelling to rigorously study the utility of LLMs in generating such meta-reviews in an academic peer-review setting. In this paper, we perform a case study with three popular LLMs, i.e., GPT-3.5, LLaMA2, and PaLM2, to automatically generate meta-reviews by prompting them with different types/levels of prompts based on the recently proposed TELeR taxonomy. Finally, we perform a detailed qualitative study of the meta-reviews generated by the LLMs and summarize our findings and recommendations for prompting LLMs for this complex task.
- Revisiting automatic evaluation of extractive summarization task: Can we do better than rouge? In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 1547–1560. Association for Computational Linguistics.
- Mousumi Akter and Shubhra Kanti Karmaker Santu. 2023a. Fans: a facet-based narrative similarity metric. CoRR, abs/2309.04823.
- Mousumi Akter and Shubhra Kanti Karmaker Santu. 2023b. Redundancy aware multi-reference based gainwise evaluation of extractive summarization. CoRR, abs/2308.02270.
- Learning to generate overlap summaries through noisy synthetic data. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 11765–11777. Association for Computational Linguistics.
- SEM-F1: an automatic way for semantic evaluation of multi-narrative overlap summaries at scale. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 780–792. Association for Computational Linguistics.
- Semantic overlap summarization among multiple alternative narratives: An exploratory study. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pages 6195–6207. International Committee on Computational Linguistics.
- Language models are few-shot learners.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
- Glam: Efficient scaling of language models with mixture-of-experts.
- Michael Fire and Carlos Guestrin. 2019. Over-optimization of academic publishing metrics: Observing goodhart’s law in action. GigaScience, 8.
- Ptr: Prompt tuning with rules for text classification. AI Open, 3:182–192.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Shubhra Kanti Karmaker Santu and Dongji Feng. 2023. Teler: A general taxonomy of llm prompts for benchmarking complex tasks.
- SOFSAT: towards a setlike operator based framework for semantic analysis of text. SIGKDD Explor., 20(2):21–30.
- Exploring universal sentence encoders for zero-shot text classification. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, AACL/IJCNLP 2022 - Volume 2: Short Papers, Online only, November 20-23, 2022, pages 135–147. Association for Computational Linguistics.
- Zero-shot multi-label topic inference with sentence encoders and llms. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 16218–16233. Association for Computational Linguistics.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Lamda: Language models for dialog applications.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Exploring the limits of chatgpt for query or aspect-based text summarization. arXiv preprint arXiv:2302.08081.
- Shubhra Kanti Karmaker Santu (17 papers)
- Sanjeev Kumar Sinha (1 paper)
- Naman Bansal (7 papers)
- Alex Knipper (3 papers)
- Souvika Sarkar (10 papers)
- John Salvador (3 papers)
- Yash Mahajan (7 papers)
- Sri Guttikonda (1 paper)
- Mousumi Akter (7 papers)
- Matthew Freestone (3 papers)
- Matthew C. Williams Jr (1 paper)