How Ready are Pre-trained Abstractive Models and LLMs for Legal Case Judgement Summarization? (2306.01248v2)
Abstract: Automatic summarization of legal case judgements has traditionally been attempted by using extractive summarization methods. However, in recent years, abstractive summarization models are gaining popularity since they can generate more natural and coherent summaries. Legal domain-specific pre-trained abstractive summarization models are now available. Moreover, general-domain pre-trained LLMs, such as ChatGPT, are known to generate high-quality text and have the capacity for text summarization. Hence it is natural to ask if these models are ready for off-the-shelf application to automatically generate abstractive summaries for case judgements. To explore this question, we apply several state-of-the-art domain-specific abstractive summarization models and general-domain LLMs on Indian court case judgements, and check the quality of the generated summaries. In addition to standard metrics for summary quality, we check for inconsistencies and hallucinations in the summaries. We see that abstractive summarization models generally achieve slightly higher scores than extractive models in terms of standard summary evaluation metrics such as ROUGE and BLEU. However, we often find inconsistent or hallucinated information in the generated abstractive summaries. Overall, our investigation indicates that the pre-trained abstractive summarization models and LLMs are not yet ready for fully automatic deployment for case judgement summarization; rather a human-in-the-loop approach including manual checks for inconsistencies is more suitable at present.
- Extractive summarization of legal decisions using multi-task learning and maximal marginal relevance. arXiv preprint arXiv:2210.12437.
- Hussam Alkaissi and Samy I McFarlane. 2023. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus, 15(2).
- Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72.
- Incorporating domain knowledge for extractive summarization of legal case documents. In Proceedings of the eighteenth international conference on artificial intelligence and law, pages 22–31.
- Ensemble methods for improving extractive summarization of legal case judgements. Artificial Intelligence and Law, pages 1–59.
- Automatic text summarization: A comprehensive survey. Expert Systems with Applications, 165:113679.
- Diego de Vargas Feijo and Viviane P Moreira. 2023. Improving abstractive summarization of legal rulings through textual entailment. Artificial intelligence and law, 31(1):91–113.
- Katja Filippova. 2020. Controlled hallucinations: Learning to generate faithfully from noisy data. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 864–870.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
- Named entity recognition in Indian court judgments. In Proceedings of the Natural Legal Language Processing Workshop, pages 184–193.
- SummaC: Re-visiting NLI-based models for inconsistency detection in summarization. Transactions of the Association for Computational Linguistics, 10:163–177.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81. Association for Computational Linguistics.
- Chao-Lin Liu and Kuan-Chun Chen. 2019. Extracting the gist of chinese judgments of the supreme court. In proceedings of the seventeenth international conference on artificial intelligence and law, pages 73–82.
- Yang Liu. 2019. Fine-tune bert for extractive summarization. arXiv preprint arXiv:1903.10318.
- Gianluca Moro and Luca Ragazzi. 2022. Semantic self-segmentation for abstractive summarization of long documents in low-resource regimes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11085–11093.
- Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, page 3075–3081.
- Ani Nenkova and Kathleen McKeown. 2012. A Survey of Text Summarization Techniques, pages 43–76. Springer US.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- CaseSummarizer: A system for automated summarization of legal texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations, pages 258–262.
- Legal case document summarization: Extractive and abstractive methods and their evaluation. In Proceedings of the Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1048–1064.
- Karolina Stanczak and Isabelle Augenstein. 2021. A survey on gender bias in natural language processing. arXiv preprint arXiv:2112.14168.
- PEGASUS: Pre-Training with Extracted Gap-Sentences for Abstractive Summarization. In Proceedings of the International Conference on Machine Learning (ICML).
- Benchmarking large language models for news summarization. arXiv preprint arXiv:2301.13848.
- Reducing Quantity Hallucinations in Abstractive Summarization. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2237–2249.
- Automatic summarization of legal decisions using iterative masking of predictive sentences. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law (ICAIL), page 163–172.
- Aniket Deroy (29 papers)
- Kripabandhu Ghosh (34 papers)
- Saptarshi Ghosh (82 papers)