ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary (2403.02574v1)
Abstract: The literature review is an indispensable step in the research process. It provides the benefit of comprehending the research problem and understanding the current research situation while conducting a comparative analysis of prior works. However, literature summary is challenging and time consuming. The previous LLM-based studies on literature review mainly focused on the complete process, including literature retrieval, screening, and summarization. However, for the summarization step, simple CoT method often lacks the ability to provide extensive comparative summary. In this work, we firstly focus on the independent literature summarization step and introduce ChatCite, an LLM agent with human workflow guidance for comparative literature summary. This agent, by mimicking the human workflow, first extracts key elements from relevant literature and then generates summaries using a Reflective Incremental Mechanism. In order to better evaluate the quality of the generated summaries, we devised a LLM-based automatic evaluation metric, G-Score, in refer to the human evaluation criteria. The ChatCite agent outperformed other models in various dimensions in the experiments. The literature summaries generated by ChatCite can also be directly used for drafting literature reviews.
- Automatic related work section generation: experiments in scientific document abstracting. 125(3).
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Litllm: A toolkit for scientific literature review.
- Language models are few-shot learners.
- Capturing relations between scientific papers: An abstractive model for related work section generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6068–6077, Online. Association for Computational Linguistics.
- Target-aware abstractive related work generation with contrastive learning. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.
- BACO: A background knowledge- and content-based framework for citing sentence generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1466–1478, Online. Association for Computational Linguistics.
- News summarization and evaluation in the era of gpt-3.
- Cong Duy Vu Hoang and Min-Yen Kan. 2010. Towards automated related work summarization20. In Coling 2010: Posters, pages 427–435, Beijing, China. Coling 2010 Organizing Committee.
- Jingshan Huang and Ming Tan. 2023. The role of chatgpt in scientific communication: writing better scientific review articles. American Journal of Cancer Research, 13(4):1148.
- Army Justitia and Hei-Chia Wang. 2022. Automatic related work section in scientific article: Research trends and future directions. In 2022 International Seminar on Intelligent Technology and Its Applications (ISITIA), pages 108–114. IEEE.
- Chin-Yew Lin. 2004a. Rouge: A package for automatic evaluation of summaries. In Annual Meeting of the Association for Computational Linguistics.
- Chin-Yew Lin. 2004b. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- G-eval: Nlg evaluation using gpt-4 with better human alignment.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Language models are unsupervised multitask learners.
- Toc-rwg: Explore the combination of topic model and citation information for automatic related work generation. IEEE Access, 8:13043–13055.
- Chain-of-thought prompting elicits reasoning in large language models.
- Automatic generation of citation texts in scholarly papers: A pilot study. In Annual Meeting of the Association for Computational Linguistics.